Positional Encoding

A technique that injects sequence position information into transformer inputs, compensating for the architecture's lack of inherent ordering.

Since transformers process all tokens in parallel, they have no inherent sense of sequence order. Positional encoding solves this by adding position-dependent vectors to word embeddings before attention computation. The original transformer used sinusoidal functions; modern models often use learned encodings or rotary positional encoding (RoPE), which encodes position directly into the attention computation.

Also known as