Implement a KV Cache for Transformer Inference

CodinghardCommon

kv-cachetransformergpu-memoryinference

Reported

7 times

Last seen

2026-03-25

First seen

2025-08-10

Active in

2025, 2026

Description

Build an efficient key-value cache for transformer model inference. Handle memory management, eviction, and multi-query attention.

Discuss how KV cache grows with sequence length and batch size. Cover PagedAttention for non-contiguous memory and cache sharing across requests.

Blind·SDE-3·2026-03-25

Glassdoor·Staff·2025-12-05

Typically appears in: Phone Screen

60 min — Coding problem focused on algorithms and systems thinking.