RESEARCH
ICLR 2026 · GOOGLE RESEARCH

Compression, evals, and the agentic substrate.

Research that closes the gap between frontier models and production deployment.

RESEARCH · ICLR 2026 · GOOGLE RESEARCH
Published 24 Mar 2026

TurboQuant — redefining AI efficiency with extreme compression.

A theoretically grounded two-stage quantisation algorithm that achieves near-optimal distortion rates across all bit-widths. By randomly rotating input vectors (PolarQuant) then applying a 1-bit QJL residual correction, TurboQuant reaches 3-bit zero-loss KV-cache compression — no training, no fine-tuning, deployable in real-time production systems.

— Amir Zandieh, Majid Daliri, Vahab Mirrokni et al.
  Google Research

32-BIT INPUT100%
4.0 bytes / value
3-BIT OUTPUT16.7%
0.375 bytes / value
  • — 6 × memory reduction
  • — 8 × faster on H100
  • — 0% accuracy loss