Edwin Gao - Software Engineer & Tech Blog

I work in AI engineering, and this personal website is where I share my journey—designing systems, serving models, optimizing performance, and everything in between. Along the way, I'll also share the lessons learned, including the mistakes that shaped my growth.

Social Links:

Featured

vLLM - Why Requests Take Hours Under Load

14 Jan, 2025

Why vLLM requests can take 2-3 hours under heavy load - analyzing KV cache block exhaustion, queue starvation.

Mini SGLang (Part 2) - Batching & Advanced Scheduling

6 Jan, 2025

Continuing the Mini SGLang deep dive - covering request batching, overlap scheduling, and tensor parallelism.

Mini SGLang (Part 1) - Architecture, Engine & Request Flow

4 Jan, 2025

Deep dive into Mini SGLang architecture - covering system design, engine initialization, KV cache, and single request lifecycle.

All Posts