Tag
KV cache compression
3 articles

Tools & Apps/May 20
TurboQuant turns vLLM KV cache into 3-bit storage
I break down TurboQuant’s vLLM cache compression and give you a copy-ready setup for 3-bit KV cache and fallback paths.

Research/Apr 29
TurboQuant brings near-optimal online vector quantization
TurboQuant is an online, accelerator-friendly vector quantizer that targets near-optimal MSE and inner-product distortion.

Tools & Apps/Apr 3
TurboQuant, Fast Cold Starts, and Rust on GPUs
TurboQuant cuts KV cache use 4.6x, GPU state restoration slashes cold starts, and Rust is moving deeper into CUDA work.