← 首页 | 厂商论文 | 导读
Mistral

Mixtral of Experts

Mixtral:混合专家模型

📅 2024-01-03👤 Mistral AI Team📄 arXiv: 2401.04088
MixtralMoE混合专家稀疏激活开源

中文摘要

Mixtral 8x7B是一个稀疏的混合专家(MoE)模型,总参数达47B,但每次推理仅激活约13B参数。该模型在性能上超越Llama 2 70B,同时在推理成本和延迟方面与7B模型相当。Mixtral支持8K上下文窗口,采用滑动窗口注意力,并可通过LoRA微调。模型采用Apache 2.0许可发布。

Mixtral 8x7B is a sparse mixture-of-experts (MoE) model with 47B total parameters but only activating ~13B per inference. It outperforms Llama 2 70B while maintaining inference cost and latency comparable to a 7B model. Mixtral supports 8K context, uses Sliding Window Attention, and supports LoRA fine-tuning.

快速链接

← 厂商论文列表首页