Compression, evals, and the agentic substrate.

Research that closes the gap between frontier models and production deployment.

RESEARCH · ICLR 2026 · GOOGLE RESEARCH

Published 24 Mar 2026

TurboQuant — redefining AI efficiency with extreme compression.

A theoretically grounded two-stage quantisation algorithm that achieves near-optimal distortion rates across all bit-widths. By randomly rotating input vectors (PolarQuant) then applying a 1-bit QJL residual correction, TurboQuant reaches 3-bit zero-loss KV-cache compression — no training, no fine-tuning, deployable in real-time production systems.

— Amir Zandieh, Majid Daliri, Vahab Mirrokni et al.
Google Research

Read the paper arXiv:2504.19874

32-BIT INPUT100%

4.0 bytes / value

3-BIT OUTPUT16.7%

0.375 bytes / value

— 6 × memory reduction
— 8 × faster on H100
— 0% accuracy loss