Model Support in a VLM Serving Stack Is Not a Checkbox - It Is a Six-Layer Systems Contract
Why real multimodal model support is a six-layer serving-stack contract, from API extraction to decoder reentry.
All the articles I've posted.
Why real multimodal model support is a six-layer serving-stack contract, from API extraction to decoder reentry.
How I designed, implemented, and hardened a cost-efficient RAG chatbot for my personal site with citations, streaming, and build-time indexing.
Deep dive into Mini SGLang architecture - covering system design, engine initialization, KV cache, and single request lifecycle.