Time to First Token

A latency metric measuring how long a user waits from sending a prompt until the first response token appears, driven by the compute-bound prefill phase.

Time to first token (TTFT) measures the delay between submitting a prompt and receiving the beginning of the response. It reflects prefill latency and scales with input length since longer prompts require more computation before generation can begin. TTFT is a key user experience metric for interactive applications, distinct from time per output token (TPOT) which measures how fast subsequent tokens stream after the first one appears.

Also known as

TTFT, first token latency