Mini SGLang
A minimal implementation of SGLang for understanding LLM inference optimization techniques including continuous batching and KV cache management.
llm-inferencepythonresearch
PythonPyTorchCUDA
Things I've built and worked on.
A minimal implementation of SGLang for understanding LLM inference optimization techniques including continuous batching and KV cache management.
Interactive visualization tool for exploring attention patterns in transformer models.