A neural network architecture that routes each input to a subset of specialized sub-networks (experts), enabling very large model capacity at a fraction of the computational cost of dense models.
Mixture of Experts (MoE) is an architecture in which a model contains many specialized sub-networks called "experts," along with a learned routing mechanism that selects which experts to activate for each input token. Rather than passing every token through every parameter in the network, only a small fraction of the total parameters are used per forward pass.
This design allows models to scale to extremely large parameter counts while keeping inference costs manageable. A 140B-parameter MoE model might only activate 20B parameters per token, making it roughly comparable in speed to a 20B dense model. Mistral's Mixtral 8x7B is a widely used open-source example of this architecture.
MoE has become central to the efficiency arms race in foundation model development. For AI practitioners, understanding MoE matters when evaluating model deployment costs: a high-parameter MoE model can deliver quality approaching much larger dense models while requiring significantly less GPU memory per inference call.
Weekly AI tool reviews, news digests, and how-to guides.
Join 12,000+ builders