Optimization

Published on
March 25, 2026|Views: 203|7 min read
Stop using torch.cat for your KV cache implementations
llm kv-cache pytorch inference optimization transformers
tl;dr: `torch.cat` is not in-place, instead use pre-allocated buffers

Stop using torch.cat for your KV cache implementations