Llm

Published on
March 25, 2026|Views: 137|7 min read
Stop using torch.cat for your KV cache implementations
llm kv-cache pytorch inference optimization transformers
tl;dr: `torch.cat` is not in-place, instead use pre-allocated buffers
Published on
March 17, 2026|Views: 302|25 min read
Energy-Based Models: From Basics to LLMs
llm energy-based-model likelihood score-matching diffusion
A whirlwind tour of energy-based models (EBMs) paradigms, from basics to modern LLM applications. Based on my talk at the Toronto LLM Meetup.
Published on
February 9, 2026|Views: 3090|40 min read
Understanding DeepSeek's Multi-Head Latent Attention (MLA)
llm attention transformers deepseek mla kv-cache inference
On bottlenecks in attention, kv caching, long-context decoding, attention variants, and how DeepSeek MLA came to be. Part 1 of the FlashMLA blog series.
Published on
May 23, 2025|Views: 1556|35 min read
Data Quality Is All You Need?
llm pretraining midtraining posttraining data-quality synthetic-data dpo
Notes on Microsoft phi-4 data pipeline for pre-training, 'mid-training', supervised fine-tuning and preference optimization

Stop using torch.cat for your KV cache implementations