Query, Key, Value (Q/K/V)

The three vectors each token produces in attention: the query asks what's relevant, the key advertises what a token contains, and the value carries the information passed forward when selected.

In the attention mechanism, each input embedding is multiplied by three separate learned weight matrices (WQ, WK, WV) to produce query, key, and value vectors. The dot product between a query and all keys determines attention scores (how much each token should attend to every other). These scores weight the corresponding value vectors to produce the attention output. In the original transformer, input embeddings were 512-dimensional and Q/K/V vectors were 64-dimensional.

Also known as