GLOSSARY

Mixture of Experts

DEFINITION

A neural network architecture that routes each input to a subset of specialized sub-networks (experts), enabling very large model capacity at a fraction of the computational cost of dense models.

Mixture of Experts (MoE) is an architecture in which a model contains many specialized sub-networks called "experts," along with a learned routing mechanism that selects which experts to activate for each input token. Rather than passing every token through every parameter in the network, only a small fraction of the total parameters are used per forward pass.

This design allows models to scale to extremely large parameter counts while keeping inference costs manageable. A 140B-parameter MoE model might only activate 20B parameters per token, making it roughly comparable in speed to a 20B dense model. Mistral's Mixtral 8x7B is a widely used open-source example of this architecture.

MoE has become central to the efficiency arms race in foundation model development. For AI practitioners, understanding MoE matters when evaluating model deployment costs: a high-parameter MoE model can deliver quality approaching much larger dense models while requiring significantly less GPU memory per inference call.

Related Terms

Stay in the loop

Weekly AI tool reviews, news digests, and how-to guides.

Join 12,000+ builders