Ming-Omni on SGLang: Architecture and the Optimizations Behind Fast Omni Serving
How we serve Ming-Omni in SGLang: the unified multimodal architecture, and the optimizations — an encoder kernel fix, tensor parallelism, CFM CUDA-graph capture, and streaming TTS — behind fast omni serving.
Higgs Audio v3 TTS on SGLang-Omni
A short pointer to the LMSYS article on work I contributed to: serving Higgs Audio v3 TTS with SGLang-Omni for real-time, controllable voice agents.
Omni Serving Design Through the Lens of Ming-Omni
Using Ming-flash-omni-2.0 as a case study to compare vllm-omni and sglang-omni design tradeoffs across thinker, talker, streaming, and audio output.