Projects

Things I've built and worked on.

Mini SGLang

A minimal implementation of SGLang for understanding LLM inference optimization techniques including continuous batching and KV cache management.

llm-inferencepythonresearch
PythonPyTorchCUDA

vLLM Performance Profiler

Diagnostic tool for analyzing vLLM request latency, KV cache utilization, and queue behavior under load.

toolingpythonperformance
PythonPrometheusGrafana

Attention Visualizer

Interactive visualization tool for exploring attention patterns in transformer models.

visualizationtypescriptresearch
TypeScriptReactD3.js

CUDA Memory Tracker

Library for tracking and debugging GPU memory allocations in PyTorch applications.

toolingpythoncuda
PythonCUDAPyTorch

Personal Website

This very website - built with Next.js, TypeScript, and Tailwind CSS. Features blog posts, project showcase, and responsive design.

webtypescriptopen-source
Next.jsTypeScriptTailwind CSSshadcn/ui