Frontier Intelligence
& Synthetic Architecture.
TurboQuant
Redefining AI Efficiency with Extreme Compression
Three theoretically grounded quantization algorithms — TurboQuant, PolarQuant, and QJL — that achieve massive compression for large language models and vector search engines. By randomly rotating input vectors and applying a 1-bit Quantized Johnson-Lindenstrauss transform on the residual, TurboQuant reaches near-optimal distortion rates within a constant factor of ≈ 2.7× of the information-theoretic lower bound.
Validated on LongBench, Needle In A Haystack, ZeroSCROLLS, RULER, and L-Eval across Gemma, Mistral, and Llama-3.1-8B. In nearest-neighbour search (GloVe d=200), TurboQuant achieves optimal 1@k recall while reducing indexing time to virtually zero.
Amir Zandieh · Majid Daliri · Majid Hadian · Vahab Mirrokni · Praneeth Kacham · Insu Han · Lars Gottesbüren · Rajesh Jayaram — Google Research
Benchmarks
Publication Venues