Higgs Audio v3 TTS on SGLang-Omni

This LMSYS Blog article covers work I participated in with the Boson AI and SGLang-Omni team. I am keeping the full write-up on LMSYS, where it belongs, and using this page as a lightweight note for readers who follow my work on inference systems and omni serving.

Higgs Audio v3 TTS has to handle streaming input, preserve voice identity and delivery as text arrives, support multilingual speech, and expose control over emotion, style, prosody, and sound effects. SGLang-Omni's role is to make that kind of heterogeneous pipeline serveable: stages can have different schedulers, memory budgets, communication paths, and streaming behavior while still looking like one end-to-end service.

For TTS and omni models, the serving path often looks more like preprocessing -> audio_encoder -> tts_engine -> vocoder than "one model generates tokens until done." Working around this kind of pipeline makes it very clear that CUDA-graph-friendly runners, streaming vocoder scheduling, stage-level memory isolation, and low-overhead stage communication are first-class parts of the model serving story.

Read the full article on LMSYS.

MOSS-TTS Local Transformer v1.5 on SGLang-Omni

Jun 17, 2026

A short pointer to the LMSYS article on work I contributed to: serving MOSS-TTS-Local Transformer v1.5 on SGLang-Omni with native streaming at 48 kHz stereo.

Ming-Omni on SGLang: Architecture and the Optimizations Behind Fast Omni Serving

Jun 8, 2026

How we serve Ming-Omni in SGLang: the unified multimodal architecture, and the optimizations — an encoder kernel fix, tensor parallelism, CFM CUDA-graph capture, and streaming TTS — behind fast omni serving.

Omni Serving Design Through the Lens of Ming-Omni

May 29, 2026

Using Ming-flash-omni-2.0 as a case study to compare vllm-omni and sglang-omni design tradeoffs across thinker, talker, streaming, and audio output.

Read the full article on LMSYS.

Related posts

MOSS-TTS Local Transformer v1.5 on SGLang-Omni

Ming-Omni on SGLang: Architecture and the Optimizations Behind Fast Omni Serving

Omni Serving Design Through the Lens of Ming-Omni

Related posts

MOSS-TTS Local Transformer v1.5 on SGLang-Omni

Ming-Omni on SGLang: Architecture and the Optimizations Behind Fast Omni Serving

Omni Serving Design Through the Lens of Ming-Omni