Context Window
The maximum number of tokens a language model can process in a single forward pass, determining how much text the model can "see" at once.
Context window size directly impacts what tasks a model can perform—longer windows enable processing entire codebases or documents. However, because self-attention scales quadratically with sequence length (O(n²)), expanding context windows dramatically increases compute and memory costs. Modern models range from 4K to over 1 million tokens, with techniques like FlashAttention and hybrid architectures helping manage the scaling challenge.
Also known as
context length, sequence length, max context, token limit