Skip to content

Wenyao Gao

My main interest and focus is on machine learning systems and large language model inference — serving models, managing KV cache, squeezing latency out of GPUs. I also contribute to open-source inference engines like SGLang and vLLM. This site is where I write down what I learn, lessons and the mistakes along the journey.

Check my blogs and projects below!

Featured

Higgs Audio v3 TTS on SGLang-Omni

A short pointer to the LMSYS article on work I contributed to: serving Higgs Audio v3 TTS with SGLang-Omni for real-time, controllable voice agents.