← Back to OpenAI
2
Implement a KV Cache for Transformer Inference
CodinghardCommon
kv-cachetransformergpu-memoryinference
Reported
7 times
Last seen
2026-03-25
First seen
2025-08-10
Active in
2025, 2026
Description
Build an efficient key-value cache for transformer model inference. Handle memory management, eviction, and multi-query attention.
Approach Tips
Discuss how KV cache grows with sequence length and batch size. Cover PagedAttention for non-contiguous memory and cache sharing across requests.
Related LeetCode Problem
LC #146 - LRU Cache
Sources
Blind·SDE-3·2026-03-25
Glassdoor·Staff·2025-12-05
OA
OpenAI
AI
Typically appears in: Phone Screen
60 min — Coding problem focused on algorithms and systems thinking.