Mixture of Experts

A neural network architecture that routes inputs to specialized subnetworks (experts), activating only a fraction of total parameters per inference to reduce compute costs.

Mixture of Experts (MoE) is a sparse architecture where a gating network routes each input to a subset of specialized expert subnetworks. While the total model may have hundreds of billions of parameters, only a fraction activate for any given input. This enables larger model capacity without proportional inference costs. Modern implementations like DeepSeek V3 and Mixtral use learned routing mechanisms and techniques like expert parallelism to scale MoE to frontier model sizes.

Also known as