This LMSYS Blog article covers work I participated in with the Boson AI and SGLang-Omni team. I am keeping the full write-up on LMSYS, where it belongs, and using this page as a lightweight note for readers who follow my work on inference systems and omni serving.
Higgs Audio v3 TTS has to handle streaming input, preserve voice identity and delivery as text arrives, support multilingual speech, and expose control over emotion, style, prosody, and sound effects. SGLang-Omni's role is to make that kind of heterogeneous pipeline serveable: stages can have different schedulers, memory budgets, communication paths, and streaming behavior while still looking like one end-to-end service.
For TTS and omni models, the serving path often looks more like preprocessing -> audio_encoder -> tts_engine -> vocoder than "one model generates tokens until done." Working around this kind of pipeline makes it very clear that CUDA-graph-friendly runners, streaming vocoder scheduling, stage-level memory isolation, and low-overhead stage communication are first-class parts of the model serving story.
Read the full article on LMSYS.