Tag

KV cache compression

3 articles

TurboQuant turns vLLM KV cache into 3-bit storage

Tools & Apps/May 20

TurboQuant turns vLLM KV cache into 3-bit storage

I break down TurboQuant’s vLLM cache compression and give you a copy-ready setup for 3-bit KV cache and fallback paths.

TurboQuant brings near-optimal online vector quantization

Research/Apr 29

TurboQuant brings near-optimal online vector quantization

TurboQuant is an online, accelerator-friendly vector quantizer that targets near-optimal MSE and inner-product distortion.

TurboQuant, Fast Cold Starts, and Rust on GPUs

Tools & Apps/Apr 3

TurboQuant, Fast Cold Starts, and Rust on GPUs

TurboQuant cuts KV cache use 4.6x, GPU state restoration slashes cold starts, and Rust is moving deeper into CUDA work.