State Space Model

A class of sequence models that process tokens in linear time O(n) by maintaining a fixed-size hidden state, offering an efficient alternative to quadratic attention.

State space models like Mamba, RWKV, and RetNet promise constant memory per token and linear scaling with sequence length, making them attractive for very long contexts. While they show strong results at smaller scales, pure state space models haven't matched transformer performance at frontier scale. The emerging compromise is hybrid architectures that combine linear layers with periodic full attention for tasks requiring exact recall.

Also known as