Skip to content

About

Wenyao Gao

Hi there and welcome! I'm Wenyao Gao. I build and optimize systems for serving large language models — inference engines, KV cache management, batching, and the GPU infrastructure underneath.

My work involves designing high-performance inference pipelines, understanding the intricacies of GPU memory management, and implementing efficient batching strategies. I'm particularly interested in frameworks like vLLM and SGLang that push the boundaries of what's possible in LLM serving.

This blog is where I document my learnings, share technical deep-dives, and occasionally reflect on the lessons learned from both successes and failures. If you're working on similar problems or just curious about ML infrastructure, I hope you find something useful here.

Get in Touch

Feel free to reach out through any of the following channels: