Projects

Things I've built and worked on.

3 of 5 projects

Mini SGLang

A minimal implementation of SGLang for understanding LLM inference optimization techniques including continuous batching and KV cache management.

llm-inferencepythonresearch
PythonPyTorchCUDA

vLLM Performance Profiler

Diagnostic tool for analyzing vLLM request latency, KV cache utilization, and queue behavior under load.

toolingpythonperformance
PythonPrometheusGrafana

CUDA Memory Tracker

Library for tracking and debugging GPU memory allocations in PyTorch applications.

toolingpythoncuda
PythonCUDAPyTorch