DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

DeepSeek-V3.2：拓展开源大语言模型前沿

📅 2025-12-02👤 DeepSeek Team📄 arXiv: 2512.02556📊 中等

稀疏注意力强化学习MoE基础模型

中文摘要

DeepSeek-V3.2 引入 DeepSeek Sparse Attention（DSA）稀疏注意力机制和大规模强化学习框架，在推理和 Agent 能力上实现大幅超越。DSA 通过动态选择关键 token 进行注意力计算，在保持精度的同时显著降低计算复杂度。结合改进的 MoE 路由策略，V3.2 在多项基准测试中刷新开源模型记录。

DeepSeek-V3.2 introduces DeepSeek Sparse Attention (DSA) and large-scale reinforcement learning framework, achieving significant improvements in reasoning and agent capabilities.

快速链接

PDF 下载 arXiv 原文 GitHub 查看翻译 (9%)

核心贡献

引入 DeepSeek Sparse Attention（DSA）稀疏注意力机制
大规模强化学习框架，在推理和 Agent 能力上实现大幅超越
DSA 通过动态选择关键 token 进行注意力计算，显著降低计算复杂度
改进的 MoE 路由策略，在多项基准测试中刷新开源模型记录

技术细节

架构	DeepSeek Sparse Attention (DSA) + 改进 MoE
核心创新	动态 token 选择 + 稀疏注意力计算
训练方法	大规模强化学习框架
性能	在多项基准测试中刷新开源模型记录

💡 阅读建议

重点理解 DSA 如何在保持精度的同时降低计算复杂度。建议先阅读 V3 论文了解基础架构。