Mixtral:混合专家模型
Mixtral 8x7B是一个稀疏的混合专家(MoE)模型,总参数达47B,但每次推理仅激活约13B参数。该模型在性能上超越Llama 2 70B,同时在推理成本和延迟方面与7B模型相当。Mixtral支持8K上下文窗口,采用滑动窗口注意力,并可通过LoRA微调。模型采用Apache 2.0许可发布。
Mixtral 8x7B is a sparse mixture-of-experts (MoE) model with 47B total parameters but only activating ~13B per inference. It outperforms Llama 2 70B while maintaining inference cost and latency comparable to a 7B model. Mixtral supports 8K context, uses Sliding Window Attention, and supports LoRA fine-tuning.