Prefill

The first phase of LLM inference where all input tokens are processed simultaneously in a compute-bound, parallelized operation.

Prefill is the initial phase of LLM inference that processes the entire input prompt at once using matrix-matrix multiplication. Because all input tokens can be computed in parallel, this phase is compute-bound and efficiently saturates GPU utilization. Prefill latency determines time to first token (TTFT) and scales with input length.

Also known as