← Back to OpenAI
2

Implement a KV Cache for Transformer Inference

CodinghardCommon
kv-cachetransformergpu-memoryinference

Reported

7 times

Last seen

2026-03-25

First seen

2025-08-10

Active in

2025, 2026

Description

Build an efficient key-value cache for transformer model inference. Handle memory management, eviction, and multi-query attention.

Approach Tips

Discuss how KV cache grows with sequence length and batch size. Cover PagedAttention for non-contiguous memory and cache sharing across requests.

Related LeetCode Problem

LC #146 - LRU Cache

Sources

Blind·SDE-3·2026-03-25
Glassdoor·Staff·2025-12-05
OA

OpenAI

AI

Typically appears in: Phone Screen

60 min — Coding problem focused on algorithms and systems thinking.