Mistral 7B:高效开源大语言模型
Mistral 7B是一个强大的7B参数开源语言模型,在大多数基准测试中性能超越Llama 2 13B。该模型采用分组查询注意力(GQA)、滑动窗口注意力(SWA)和前缀键值缓存优化等技术,实现了卓越的性能和效率平衡。Mistral 7B仅用约相当于Llama 2 13B一半的FLOPs就训练完成。
Mistral 7B is a powerful 7B parameter open-source language model that outperforms Llama 2 13B on most benchmarks. Using Grouped-Query Attention, Sliding Window Attention, and optimized key-value caching, it achieves an exceptional balance of performance and efficiency.