About Wenyao Gao | Wenyao Gao

Hi there and welcome! I'm Wenyao Gao, and I work as an AI Engineer, focusing on building and optimizing systems for serving large language models at scale.

My work involves designing high-performance inference pipelines, understanding the intricacies of GPU memory management, and implementing efficient batching strategies. I'm particularly interested in frameworks like vLLM and SGLang that push the boundaries of what's possible in LLM serving.

This blog is where I document my learnings, share technical deep-dives, and occasionally reflect on the lessons learned from both successes and failures. If you're working on similar problems or just curious about ML infrastructure, I hope you find something useful here.

Get in Touch

Feel free to reach out through any of the following channels:

GitHub X (Twitter)LinkedIn Email