Understanding Transformer Inference Optimization
A comprehensive look at modern techniques for optimizing transformer model inference.
All the articles I've posted.
A comprehensive look at modern techniques for optimizing transformer model inference.
Breaking down attention mechanisms from self-attention to multi-head attention.