← 首页 | 导读 | 详细解读

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

DeepSeek-Coder-V2:打破闭源模型在代码智能领域的壁垒

📄 arXiv: 2406.11931📅 2024-06-19PDF
翻译进度54 / 54 段 (100%)

中文摘要

DeepSeek-Coder-V2 采用 236B MoE 架构,在代码智能领域打破了闭源模型的壁垒。该模型在 HumanEval、MBPP 等编程基准上达到领先水平,支持代码补全、代码生成、代码修复、代码解释等多种编程任务。训练数据涵盖超过 8 万亿 token 的代码和自然语言语料。

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

【摘要】DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence - 本文介绍了DeepSeek-Coder-V2的架构、训练方法和实验结果。
原文: Qihao Zhu* Daya Guo Zhihong Shao Dejian Yang Peiyi Wang Runxin Xu Y. Wu Yukun Li Huazuo Gao Shirong Ma Wangding Zeng Xiao Bi Zihui Gu Hanwei Xu Damai Dai Kai Dong Liyue Zhang Yishi Piao Zhibin Gou Zhenda Xie Zhewen Hao Bingxuan Wang Junxiao Song Deli Chen Xin Xie Kang Guan Yuxiang You Aixin Liu Qiushi Du Wenjun Gao Xuan Lu Qinyu Chen Yaohui Wang Chengqi Deng Jiashi Li Chenggang Zhao Chong Ruan Fuli Luo Wenfeng Liang DeepSeek-AI https://github.com/deepseek-ai/DeepSeek-Coder-V2 Abstract We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves perfor...

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

(DeepSeek-Coder-V2: Breaking the Barrier of Closed- - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: source counterparts, contributing to the progress of code intelligence. However, there remains a discernible gap when comparing them to state-of-the-art closed-source models like GPT4-Turbo (OpenAI, 2023 ) , Claude 3 Opus (Anthropic, 2024 ) , and Gemini 1.5 Pro (Reid et al., 2024 ) . To bridge this gap and further propel the development of open-source code models, we introduce the DeepSeek-Coder-V2 series. These models are built upon the foundation of DeepSeek-V2 (DeepSeek-AI, 2024 ) and are further pre-trained with an additional corpus with 6 trillion tokens. In the pre-training phase, the da...

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

(DeepSeek-Coder-V2: Breaking the Barrier of Closed- - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: andle more complex and extensive coding tasks. After continuous pre-training DeepSeek-V2 on this multi-source corpora, we find that DeepSeek-Coder-V2 significantly enhances the model’s capabilities in coding and mathematical reasoning while maintaining comparable general language performance. In the alignment phase, we first construct an instruction training dataset that includes code and math data from DeepSeek-Coder (Guo et al., 2024 ) and DeepSeek-Math (Shao et al., 2024 ) , as well as general instruction data from DeepSeek-V2 (DeepSeek-AI, 2024 ) . This dataset is used to fine-tune the bas...

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

(DeepSeek-Coder-V2: Breaking the Barrier of Closed- - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: ly under a permissive license, allowing for both research and unrestricted commercial use. 1.2 Summary of Evaluations and Metrics • Code : Regarding code generation benchmark evaluation, DeepSeek-Coder-V2 demonstrates remarkable superiority over all open source models while exhibiting performance on par with the leading closed-source models, such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro. Notably, we achieve a 90.2 % score on HumanEval (Chen et al., 2021 ) , a 76.2 % score on MBPP (Austin et al., 2021a ) (establishing a new state-of-the-art result with EvalPlus evaluation pipeline), and...

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

(DeepSeek-Coder-V2: Breaking the Barrier of Closed- - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: training data for DeepSeek-Coder-V2 primarily consists of 60% source code, 10% math corpus, and 30% natural language corpus. Since the natural language corpus is directly sampled from the training dataset of DeepSeek-V2, this section focuses on the collection, cleaning, and filtering processes of the code and math data. Meanwhile, we further validate the quality of this data through comparative analysis experiments. We collect public repositories created before November 2023 on GitHub. We first apply the same filtering rules and near-deduplication as those used in the DeepSeek-Coder (Guo et al...

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

(DeepSeek-Coder-V2: Breaking the Barrier of Closed- - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: tackoverflow.com , library sites such as PyTorch documentation 2 2 2 https://pytorch.org/docs , and mathematics website such as StackExchange 3 3 3 https://math.stackexchange.com as our initial seed corpus. Using this seed corpus, we train a fastText model (Joulin et al., 2016 ) to recall more coding-related and math-related web pages. Since tokenization for languages like Chinese cannot be done through spaces, we use the Byte Pair Encoding (BPE) tokenizer from DeepSeek-V2, which significantly improves the recall accuracy of fastText. For each domain, we calculate the percentage of web pages c...

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

(DeepSeek-Coder-V2: Breaking the Barrier of Closed- - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: Therefore, the new code corpus is superior to the code corpus used to train DeepSeek-Coder. Model Tokens Python C++ Java PHP TS C# Bash JS Avg MBPP DeepSeek-Coder-1B 1T 1 Introduction
【引言】DeepSeek-Coder-V2的研究背景、动机和主要贡献。 DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: The open-source community has made significant strides in advancing code intelligence through the development of open-source code models such as StarCoder (Li et al., 2023b ; Lozhkov et al., 2024 ) , CodeLlama (Roziere et al., 2023 ) , DeepSeek-Coder (Guo et al., 2024 ) , and Codestral (MistralAI, 2024 ) . These models have steadily approached the performance levels of closed-source counterparts, contributing to the progress of code intelligence. However, there remains a discernible gap when comparing them to state-of-the-art closed-source models like GPT4-Turbo (OpenAI, 2023 ) , Claude 3 Opus...

1 Introduction

【引言】DeepSeek-Coder-V2的研究背景、动机和主要贡献。 DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: pSeek-Coder-V2 has been exposed to 10.2T training tokens, where 4.2 trillion tokens originate from the DeepSeek V2 dataset, while the remaining 6 trillion tokens come from the DeepSeek-Coder-V2 dataset. To accommodate longer code inputs and enhance applicability across various programming scenarios, we extend the context length from 16K to 128K tokens, allowing our models to handle more complex and extensive coding tasks. After continuous pre-training DeepSeek-V2 on this multi-source corpora, we find that DeepSeek-Coder-V2 significantly enhances the model’s capabilities in coding and mathemati...

1 Introduction

【引言】DeepSeek-Coder-V2的研究背景、动机和主要贡献。 DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: e make the first attempt to develop an open-source hundred-billion-parameter code model to advance the field of code intelligence. Experimental results indicate that DeepSeek-Coder-V2 236B outperforms state-of-the-art closed-source models, such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro, in both coding and mathematics tasks. • DeepSeek-Coder-V2 models are released publicly under a permissive license, allowing for both research and unrestricted commercial use. 1.2 Summary of Evaluations and Metrics • Code : Regarding code generation benchmark evaluation, DeepSeek-Coder-V2 demonstrates rem...

1 Introduction

【引言】DeepSeek-Coder-V2的研究背景、动机和主要贡献。 DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: AI simple-eval pipeline. Regarding subjective evaluation with GPT-4 as a judger, DeepSeek-Coder-V2 achieves 65.0 on arena-hard (Li et al., 2024 ) , 8.77 on MT-bench (Zheng et al., 2023 ) and 7.84 on alignbench (Liu et al., 2023c ) . These scores are significantly better than other code-specific models, even comparable with general open source models.

1.1 Contributions

(1.1 Contributions - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: In summary, our main contributions are: • We introduce DeepSeek-Coder-V2 with 16B and 236B parameters based on the DeepSeekMoE framework, which has activation parameters of only 2.4B and 21B, efficiently supporting diverse computational and application needs. Additionally, DeepSeek-Coder-V2 supports 338 programming languages and a maximum context length of 128K tokens. • We make the first attempt to develop an open-source hundred-billion-parameter code model to advance the field of code intelligence. Experimental results indicate that DeepSeek-Coder-V2 236B outperforms state-of-the-art closed-...

1.2 Summary of Evaluations and Metrics

【实验结果】DeepSeek-Coder-V2在各基准测试上的性能评估结果。 DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: • Code : Regarding code generation benchmark evaluation, DeepSeek-Coder-V2 demonstrates remarkable superiority over all open source models while exhibiting performance on par with the leading closed-source models, such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro. Notably, we achieve a 90.2 % score on HumanEval (Chen et al., 2021 ) , a 76.2 % score on MBPP (Austin et al., 2021a ) (establishing a new state-of-the-art result with EvalPlus evaluation pipeline), and a 43.4 % score on LiveCodeBench (Jain et al., 2024 ) (questions from Dec. 2023 to June. 2024). Additionally, DeepSeek-Coder-V2 is...

2 Data Collection

【数据/训练】DeepSeek-Coder-V2的训练数据构建和训练流程。 DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: The pre-training data for DeepSeek-Coder-V2 primarily consists of 60% source code, 10% math corpus, and 30% natural language corpus. Since the natural language corpus is directly sampled from the training dataset of DeepSeek-V2, this section focuses on the collection, cleaning, and filtering processes of the code and math data. Meanwhile, we further validate the quality of this data through comparative analysis experiments. We collect public repositories created before November 2023 on GitHub. We first apply the same filtering rules and near-deduplication as those used in the DeepSeek-Coder (G...

2 Data Collection

【数据/训练】DeepSeek-Coder-V2的训练数据构建和训练流程。 DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: ttps://stackoverflow.com , library sites such as PyTorch documentation 2 2 2 https://pytorch.org/docs , and mathematics website such as StackExchange 3 3 3 https://math.stackexchange.com as our initial seed corpus. Using this seed corpus, we train a fastText model (Joulin et al., 2016 ) to recall more coding-related and math-related web pages. Since tokenization for languages like Chinese cannot be done through spaces, we use the Byte Pair Encoding (BPE) tokenizer from DeepSeek-V2, which significantly improves the recall accuracy of fastText. For each domain, we calculate the percentage of web...

2 Data Collection

(2 Data Collection - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: ctively. Therefore, the new code corpus is superior to the code corpus used to train DeepSeek-Coder. Model Tokens Python C++ Java PHP TS C# Bash JS Avg MBPP DeepSeek-Coder-1B 1T 30.5% 28.0% 31.7% 23.0% 30.8% 31.7% 9.5% 28.6% 26.7% 44.6% DeepSeek-Coder-V2-1B 1T 36.0% 34.8% 31.7% 27.3% 37.7% 34.2% 6.3% 38.5% 31.2% 49.0% DeepSeek-Coder-V2-1B 2T 37.2% 39.1% 32.3% 31.7% 34.6% 36.7% 12.0% 32.9% 32.0% 54.0% Table 1: Performance of 1B base model between DeepSeek-Coder and DeepSeek-Coder-V2.

3 Training Policy

(3 Training Policy - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: 3.1 Training Strategy We use two training objectives for DeepSeek-Coder-v2 16B: Next-Token-Prediction and Fill-In-Middle (FIM) (Li et al., 2023b ; Bavarian et al., 2022 ; Guo et al., 2024 ) . For DeepSeek-Coder-v2 236B, we only utilize the Next-Token-Prediction objective. Here we give a brief introduction of the FIM training policy. We adopt the FIM training approach for the development of DeepSeek-Coder-v2-16B, leveraging the PSM (Prefix, Suffix, Middle) mode. This method structures the content reconstruction in the sequence: Prefix, Suffix, and Middle, as illustrated below: <|fim_begin|> ​ f...

3 Training Policy

(3 Training Policy - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: a cosine decay strategy, starting with 2000 warm-up steps and gradually reducing the learning rate to 10% of its initial value. Both DeepSeek-Coder-V2 and DeepSeek-Coder-V2-Lite are trained using the same methodology. To maintain robust natural language understanding capabilities in DeepSeek-Coder-V2, we continue the pre-training process from an intermediate checkpoint of DeepSeek-V2. The intermediate checkpoint was initially trained on 4.2T tokens. Consequently, DeepSeek-Coder-V2 has been exposed to a total of 10.2T high-quality tokens during the pre-training phase. Model DeepSeek-Coder-V2-Li...

3 Training Policy

(3 Training Policy - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: collect 20k code-related instruction data and 30k math related data from DeepSeek-Coder and DeepSeek-Math. To maintain the general ability, we also sample several data from the instruction data of DeepSeek-V2. Finally, we use a instruction dataset of 300M tokens. For training, we use a cosine schedule with 100 warm-up steps and an initial learning rate 5 ​ e − 6 5 superscript 𝑒 6 5e^{-6} . We also use a batch size of 1M tokens and 1B tokens in total. 3.5.2 Reinforcement Learning We further employ Reinforcement Learning (RL) techniques to fully simulate the capabilities of DeepSeek-Coder-V2, wh...

3 Training Policy

(3 Training Policy - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: orithm, which is the same as what DeepSeek-V2 uses. Notably, GRPO is proven to be quite effective and has less cost compared with PPO, since there is no need to maintain an additional critic model. Figure 3: Performances of Different Methods

3.1 Training Strategy

(3.1 Training Strategy - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: We use two training objectives for DeepSeek-Coder-v2 16B: Next-Token-Prediction and Fill-In-Middle (FIM) (Li et al., 2023b ; Bavarian et al., 2022 ; Guo et al., 2024 ) . For DeepSeek-Coder-v2 236B, we only utilize the Next-Token-Prediction objective. Here we give a brief introduction of the FIM training policy. We adopt the FIM training approach for the development of DeepSeek-Coder-v2-16B, leveraging the PSM (Prefix, Suffix, Middle) mode. This method structures the content reconstruction in the sequence: Prefix, Suffix, and Middle, as illustrated below: <|fim_begin|> ​ f p ​ r ​ e ​ <|fim_hol...

3.2 Model Architecture

(3.2 Model Architecture - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: Our architecture aligns with that of DeepSeekV2 (DeepSeek-AI, 2024 ) . The hyperparameters settings, 16B and 236B, correspond to those used in DeepSeek-V2-Lite and DeepSeek-V2, respectively. Notably, we encountered instability during training and spikes in gradient values, which we attributed to the exponential normalization technique. To address this, we reverted to the conventional normalization method.

3.3 Training Hyper-Parameters

(3.3 Training Hyper-Parameters - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: Consistent with the DeepSeek V2 methodology (DeepSeek-AI, 2024 ) , we utilize the AdamW optimizer (Loshchilov and Hutter, 2019 ) , configured with β 1 = 0.9 subscript 𝛽 1 0.9 \beta_{1}=0.9 , β 2 = 0.95 subscript 𝛽 2 0.95 \beta_{2}=0.95 , and a weight decay of 0.1. Batch sizes and learning rates are adjusted according to DeepSeek-V2 specifications. For learning rate scheduling, we employ a cosine decay strategy, starting with 2000 warm-up steps and gradually reducing the learning rate to 10% of its initial value. Both DeepSeek-Coder-V2 and DeepSeek-Coder-V2-Lite are trained using the same metho...

3.4 Long Context Extension

(3.4 Long Context Extension - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: Following DeepSeek-V2, we extend the context length of DeepSeek-Coder-V2 to 128K using Yarn (Peng et al., 2023 ) . The hyper-parameters of YARN are the same as DeepSeek-V2: the scale s 𝑠 s to 40, α 𝛼 \alpha to 1, β 𝛽 \beta to 32. We further continue training the model using two stages to enhance its capability for handling long contexts. In the first stage, we utilize a sequence length of 32K and a batch size of 1152 for 1000 steps. In the second stage, we train the model for an additional 1000 steps, employing a sequence length of 128K and a batch size of 288 sequences. It should be noted her...

3.5 Alignment

(3.5 Alignment - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: 3.5.1 Supervised Fine-Tuning To build DeepSeek-Coder-V2 Chat, we construct the instruction training dataset mixed with code and math data. We first collect 20k code-related instruction data and 30k math related data from DeepSeek-Coder and DeepSeek-Math. To maintain the general ability, we also sample several data from the instruction data of DeepSeek-V2. Finally, we use a instruction dataset of 300M tokens. For training, we use a cosine schedule with 100 warm-up steps and an initial learning rate 5 ​ e − 6 5 superscript 𝑒 6 5e^{-6} . We also use a batch size of 1M tokens and 1B tokens in tota...

3.5 Alignment

(3.5 Alignment - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: all subsequent experiments. Reinforcement Learning Algorithm We employ Group Relative Policy Optimization (GRPO) Shao et al. ( 2024 ) as our RL algorithm, which is the same as what DeepSeek-V2 uses. Notably, GRPO is proven to be quite effective and has less cost compared with PPO, since there is no need to maintain an additional critic model. Figure 3: Performances of Different Methods

4 Experimental Results

(4 Experimental Results - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: In this section, we evaluate DeepSeek-Coder-V2 on three types of tasks, including coding, mathematics, and general natural language. We compare DeepSeek-Coder-V2 with the previous state-of-the-art large language models. • CodeLlama (Roziere et al., 2023 ) consists of a series of code language models based on Llama2 (Touvron et al., 2023 ) , and continue pre-training on datasets ranging from 500 to 1000 billion code tokens. These models are available in four sizes: 7B, 13B, 34B, and 70B. • StarCoder (Lozhkov et al., 2024 ) is a publicly accessible model with 15 billion parameters. It is specifi...

4 Experimental Results

(4 Experimental Results - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: MBPP Benchmarks. The HumanEval (Chen et al., 2021 ) 4 4 4 We use the template ”Please complete the python function below. The final complete version of your function must be returned within a code block. Here is the unfinished function:\n ```python\n{problem_description}\n\n” to build the instruction prompt. and MBPP (Austin et al., 2021b ) benchmarks are commonly utilized for assessing the performance of code-generating Large Language Models (LLMs). HumanEval comprises 164 Python tasks that are verified through test cases to evaluate the performance of Code LLMs in a zero-shot scenario. For M...

4 Experimental Results

(4 Experimental Results - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: 6% 68.9% Claude-3-Opus - - 63.9% 55.9% 76.1% 60.3% 71.2% 64.6% 72.0% 70.8% GPT-4-1106 - - 62.7% 57.8% 69.2% 60.9% 78.8% 64.0% 69.3% 72.5% GPT-4-Turbo-0409 - - 63.9% 56.5% 69.8% 61.5% 78.8% 63.4% 72.2% 72.3% GPT-4o-0513 - - 75.9% 65.2% 78.0% 60.9% 80.1% 64.6% 73.5% 76.4% Open-Source Models Codestral 22B 22B 63.3% 49.7% 67.9% 32.1% 67.3% 37.3% 68.2% 63.2% DS-Coder-instruct 33B 33B 61.4% 44.7% 53.5% 31.4% 68.6% 46.0% 70.1% 61.9% Llama3-Instruct 70B 70B 55.1% 46.0% 62.9% 48.1% 58.3% 46.0% 68.8% 60.6% DS-Coder-V2-Lite-Instruct 16B 2.4B 64.6% 47.8% 67.3% 45.5% 62.2% 41.6% 68.8% 65.6% DS-Coder-V2-Ins...

4 Experimental Results

(4 Experimental Results - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: unterparts. Competitive Programming. To further validate the model’s capability in real-world competitive programming problems, we utilize the LiveCodeBench (Jain et al., 2024 ) and USACO benchmark (Shi et al., 2024 ) to estimate the effectiveness of DeepSeek-Coder-V2. LiveCodeBench is a meticulous and contamination-free assessment of Large Language Models (LLMs) for code generation, systematically gathering novel challenges over time from three prominent competitive programming platforms: LeetCode, AtCoder, and CodeForces. Since the cut-off of the training data is before November 2023, we use...

4 Experimental Results

(4 Experimental Results - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: ng the leading GPT-4-Turbo variant. 4.2 Code Completion 4.2.1 Repository-Level Code Completion Evaluation We use RepoBench (Liu et al., 2023b ) to evaluate the capabilities of currently available open-source code models with sizes below 35B in repository-level code completion tasks. This dataset is constructed from a diverse set of real-world, open-sourced, permissively licensed repositories in two popular programming languages: Python and Java. Notably, the latest version (v1.1) of RepoBench sources its data from GitHub repositories created between October 6th and December 31st, 2023, while o...

4 Experimental Results

(4 Experimental Results - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: % 44.3% 46.6% 46.6% 51.5% 46.1% 48.3% 47.8% 46.0% 42.2% 43.9% 45.7% DS-Coder-V2-Lite-Base 16B 2.4B 38.3% 38.6% 40.6% 38.3% 38.7% 38.9% 48.8% 45.7% 42.4% 38.1% 41.1% 43.3% Table 5: Performance of different models on December subset of RepoBench v1.1. As shown in Table 5 , the results indicate that the DeepSeek-Coder-V2-Lite-Base model, despite having only 2.4 billion active parameters, achieves code completion capabilities in Python comparable to the DeepSeek-Coder-Base 33B model and in Java comparable to the DeepSeek-Coder-Base 7B model. Compared to CodeStral, the DeepSeek-Coder-V2-Lite-Base m...

4 Experimental Results

(4 Experimental Results - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: eek-Coder. . Model #TP #AP python java javascript Mean StarCoder 6 6 6 StartCoder-2 has some problems with FIM, thus we still use StartCoder. 16B 16B 71.5% 82.3% 83.0% 80.2% CodeLlama-Base 7B 7B 58.6% 70.6% 70.7% 68.0% CodeLlama-Base 13B 13B 60.7% 74.3% 78.5% 73.1% DS-Coder-Base 1B 1B 74.1% 85.1% 82.9% 81.8% DS-Coder-Base 7B 7B 79.8% 89.6 % 86.3% 86.1% DS-Coder-Base 33B 33B 80.5% 88.4% 86.6% 86.4 % Codestral 22B 22B 77.2% 83.2% 85.9% 83.0% DS-Coder-V2-Lite-Base 16B 2.4B 80.0% 89.1% 87.2% 86.4% Table 6: Performance of different approaches on the FIM-Tasks. The table presents the performance of ...

4 Experimental Results

(4 Experimental Results - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: several files in the repository resulting in a long context, we collect 238 bugs that only need to modify one method from this benchmark. SWE-bench is a comprehensive benchmark designed to evaluate the performance of large language models in addressing real-world software issues sourced from GitHub. The benchmark presents a codebase alongside a specific issue, challenging the language model to generate a patch that effectively resolves the described problem. This rigorous evaluation framework ensures that the language model’s ability to understand and fix real-world software issues is thorough...

4 Experimental Results

(4 Experimental Results - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: SWE-Bench, closely approaching the results of leading closed-source models and demonstrating significant capability in handling longer code sequences. Notably, DeepSeek-Coder-V2-Instruct achieves the highest score of 73.7% in Aider, surpassing all other models listed, including closed-source counterparts. This superior performance highlights its efficiency and robustness in automated code repair tasks, positioning DeepSeek-Coder-V2-Instruct as the top open-source model and a formidable competitor to closed-source alternatives in the field. 4.4 Code Understanding and Reasoning To assess the cod...

4 Experimental Results

(4 Experimental Results - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: hen compared to larger closed-source models, there is a performance gap. This performance gap may largely be attributed to the fact that DeepSeek-Coder-V2-Instruct operates with only 21 billion activation parameters, which is considerably fewer than those in larger, more advanced closed-source models like GPT-4o. This limitation in model complexity could restrict its learning and problem-solving capacities. 4.5 Mathematical Reasoning To assess the mathematical reasoning capabilities of DeepSeekCoder-V2, we utilized the popular grade-school benchmark GSM8K (Cobbe et al., 2021 ) , along with adv...

4 Experimental Results

(4 Experimental Results - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: er-V2 solves more problems from AIME 2024 than the other models, demonstrating its strong mathematical reasoning capabilities. 4.6 General Natural Language As DeepSeek-Coder-V2 is built upon DeepSeek-V2, it inherits the strong natural language capability, even surpassing DeepSeek-V2 on reasoning-related benchmarks. We compare DeepSeek-Coder-V2 Instruct with DeepSeek-V2 Chat on standard benchmarks, which covers both English and Chinese benchmarks, including BigBench Hard (BBH) (Suzgun et al., 2022 ) , MMLU (Hendrycks et al., 2020 ) , ARC (Clark et al., 2018 ) , TriviaQA (Joshi et al., 2017 ) , ...

4 Experimental Results

(4 Experimental Results - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: 0: A Comparison of DeepSeek-Coder-V2 Instruct with DeepSeek-V2 Chat. When comparing the performance of 16B models, it is evident that DeepSeek-Coder-V2-Lite-Instruct outperforms DeepSeek-V2-Lite-Chat in benchmarks like BBH and Arena-Hard. These benchmarks place a high demand on the model’s reasoning ability, which DeepSeek-Coder-V2-Lite-Instruct excels at. However, DeepSeek-Coder-V2-Lite Instruct falls behind in knowledge-intensive benchmarks like TriviaQA, primarily due to the relatively smaller amount of web data used during pre-training. Moving on to 236B models, DeepSeek-Coder-V2 Instruct ...

4.1 Code Generation

(4.1 Code Generation - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: HumanEval and MBPP Benchmarks. The HumanEval (Chen et al., 2021 ) 4 4 4 We use the template ”Please complete the python function below. The final complete version of your function must be returned within a code block. Here is the unfinished function:\n ```python\n{problem_description}\n\n” to build the instruction prompt. and MBPP (Austin et al., 2021b ) benchmarks are commonly utilized for assessing the performance of code-generating Large Language Models (LLMs). HumanEval comprises 164 Python tasks that are verified through test cases to evaluate the performance of Code LLMs in a zero-shot s...

4.1 Code Generation

(4.1 Code Generation - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: 3.1% 48.4% 74.6% 68.9% Claude-3-Opus - - 63.9% 55.9% 76.1% 60.3% 71.2% 64.6% 72.0% 70.8% GPT-4-1106 - - 62.7% 57.8% 69.2% 60.9% 78.8% 64.0% 69.3% 72.5% GPT-4-Turbo-0409 - - 63.9% 56.5% 69.8% 61.5% 78.8% 63.4% 72.2% 72.3% GPT-4o-0513 - - 75.9% 65.2% 78.0% 60.9% 80.1% 64.6% 73.5% 76.4% Open-Source Models Codestral 22B 22B 63.3% 49.7% 67.9% 32.1% 67.3% 37.3% 68.2% 63.2% DS-Coder-instruct 33B 33B 61.4% 44.7% 53.5% 31.4% 68.6% 46.0% 70.1% 61.9% Llama3-Instruct 70B 70B 55.1% 46.0% 62.9% 48.1% 58.3% 46.0% 68.8% 60.6% DS-Coder-V2-Lite-Instruct 16B 2.4B 64.6% 47.8% 67.3% 45.5% 62.2% 41.6% 68.8% 65.6% D...

4.1 Code Generation

(4.1 Code Generation - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: form larger counterparts. Competitive Programming. To further validate the model’s capability in real-world competitive programming problems, we utilize the LiveCodeBench (Jain et al., 2024 ) and USACO benchmark (Shi et al., 2024 ) to estimate the effectiveness of DeepSeek-Coder-V2. LiveCodeBench is a meticulous and contamination-free assessment of Large Language Models (LLMs) for code generation, systematically gathering novel challenges over time from three prominent competitive programming platforms: LeetCode, AtCoder, and CodeForces. Since the cut-off of the training data is before Novembe...

4.1 Code Generation

(4.1 Code Generation - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: closely trailing the leading GPT-4-Turbo variant.

4.2 Code Completion

(4.2 Code Completion - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: 4.2.1 Repository-Level Code Completion Evaluation We use RepoBench (Liu et al., 2023b ) to evaluate the capabilities of currently available open-source code models with sizes below 35B in repository-level code completion tasks. This dataset is constructed from a diverse set of real-world, open-sourced, permissively licensed repositories in two popular programming languages: Python and Java. Notably, the latest version (v1.1) of RepoBench sources its data from GitHub repositories created between October 6th and December 31st, 2023, while our pre-training data includes code created before Novemb...

4.2 Code Completion

(4.2 Code Completion - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: 43.9% 45.7% DS-Coder-V2-Lite-Base 16B 2.4B 38.3% 38.6% 40.6% 38.3% 38.7% 38.9% 48.8% 45.7% 42.4% 38.1% 41.1% 43.3% Table 5: Performance of different models on December subset of RepoBench v1.1. As shown in Table 5 , the results indicate that the DeepSeek-Coder-V2-Lite-Base model, despite having only 2.4 billion active parameters, achieves code completion capabilities in Python comparable to the DeepSeek-Coder-Base 33B model and in Java comparable to the DeepSeek-Coder-Base 7B model. Compared to CodeStral, the DeepSeek-Coder-V2-Lite-Base model has only one-tenth of the active parameters of Code...

4.2 Code Completion

(4.2 Code Completion - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: tarCoder 6 6 6 StartCoder-2 has some problems with FIM, thus we still use StartCoder. 16B 16B 71.5% 82.3% 83.0% 80.2% CodeLlama-Base 7B 7B 58.6% 70.6% 70.7% 68.0% CodeLlama-Base 13B 13B 60.7% 74.3% 78.5% 73.1% DS-Coder-Base 1B 1B 74.1% 85.1% 82.9% 81.8% DS-Coder-Base 7B 7B 79.8% 89.6 % 86.3% 86.1% DS-Coder-Base 33B 33B 80.5% 88.4% 86.6% 86.4 % Codestral 22B 22B 77.2% 83.2% 85.9% 83.0% DS-Coder-V2-Lite-Base 16B 2.4B 80.0% 89.1% 87.2% 86.4% Table 6: Performance of different approaches on the FIM-Tasks. The table presents the performance of various coding models on FIM (Fill-in-the-Middle) tasks ...

4.3 Code Fixing

(4.3 Code Fixing - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: To evaluate the bug-fixing capabilities of the model, we used the Defects4J 7 7 7 https://github.com/rjust/defects4j , SWE-bench (Jimenez et al., 2023 ) , and Aider 8 8 8 https://github.com/paul-gauthier/aider datasets for testing. Defects4J is a widely used dataset in the field of software engineering, specifically designed for the purpose of evaluating and testing program repair techniques. It consists of a collection of real-world software bugs from various open-source projects, including but not limited to Apache Commons, JFreeChart, and Closure Compiler. Each bug in the dataset is accompa...

4.3 Code Fixing

(4.3 Code Fixing - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: % 63.9% GPT-4o-0513 - - 26.1 % 26.7 % 72.9% Open-Source Models Codestral 22B 22B 17.8% 2.7% 51.1% DS-Coder-Instruct 33B 33B 11.3% 0.0% 54.5% Llama3-Instruct 70B 70B 16.2% - 49.2% DS-Coder-V2-Lite-Instruct 16B 2.4B 9.2% 0.0% 44.4% DS-Coder-V2-Instruct 236B 21B 21.0% 12.7 % 73.7% Table 7: Performances of different models on repair benchmarks. We do not evaluate Llama3-Instruct on SWE-Bench as it just supports 8K context length. Table 7 outlines the performances of different language models on software repair benchmarks, including Defects4J, SWE-Bench, and Aider. Among open-source models, DeepSee...

4.4 Code Understanding and Reasoning

(4.4 Code Understanding and Reasoning - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: To assess the code reasoning capabilities of our models, we utilize the CRUXEval benchmark. This benchmark comprises 800 Python functions paired with corresponding input-output examples. It is divided into two distinct tasks: CRUXEval-I, which requires the large language model (LLM) to predict the output based on the given input, and CRUXEval-O, where the model must predict the input from the known output. This structure challenges the model’s ability to understand and reason through Python code in both forward and reverse directions. Model #TP #AP CruxEval-I-COT CruxEval-O-COT Closed-Source M...

4.5 Mathematical Reasoning

(4.5 Mathematical Reasoning - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: To assess the mathematical reasoning capabilities of DeepSeekCoder-V2, we utilized the popular grade-school benchmark GSM8K (Cobbe et al., 2021 ) , along with advanced competition-level benchmarks including MATH (Hendrycks et al., 2021 ) , the American Invitational Mathematics Examination (AIME) 2024 (MAA, 2024 ) , and Math Odyssey (Netmind.AI, 2024 ) 9 9 9 The performance of DeepSeek-Coder-V2 on the four mathematical benchmarks was obtained with zero-shot chain-of-thought prompting; each test question was concatenated with the instruction: ” \ \ \backslash nPlease reason step by step, and put...

4.6 General Natural Language

(4.6 General Natural Language - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: As DeepSeek-Coder-V2 is built upon DeepSeek-V2, it inherits the strong natural language capability, even surpassing DeepSeek-V2 on reasoning-related benchmarks. We compare DeepSeek-Coder-V2 Instruct with DeepSeek-V2 Chat on standard benchmarks, which covers both English and Chinese benchmarks, including BigBench Hard (BBH) (Suzgun et al., 2022 ) , MMLU (Hendrycks et al., 2020 ) , ARC (Clark et al., 2018 ) , TriviaQA (Joshi et al., 2017 ) , NaturalQuestions (Kwiatkowski et al., 2019 ) , AGIEval (Zhong et al., 2023 ) , CLUEWSC (Xu et al., 2020 ) , C-Eval (Huang et al., 2023 ) , and CMMLU (Li et ...

4.6 General Natural Language

(4.6 General Natural Language - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: -Instruct outperforms DeepSeek-V2-Lite-Chat in benchmarks like BBH and Arena-Hard. These benchmarks place a high demand on the model’s reasoning ability, which DeepSeek-Coder-V2-Lite-Instruct excels at. However, DeepSeek-Coder-V2-Lite Instruct falls behind in knowledge-intensive benchmarks like TriviaQA, primarily due to the relatively smaller amount of web data used during pre-training. Moving on to 236B models, DeepSeek-Coder-V2 Instruct exhibits greater strength in reasoning benchmarks, particularly in Arena-Hard, which comprises a substantial proportion of code, math, and reasoning questio...

5 Conclusion

(5 Conclusion - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: In this paper, we introduce DeepSeek-Coder-V2 to further advance the field of code intelligence, which is continually pre-trained from DeepSeek-V2 with 6 trillion tokens sourced from a high-quality and multi-source corpus. Through this continued pre-training, we find that DeepSeek-Coder-V2 significantly enhances the model’s capabilities in coding and mathematical reasoning while maintaining comparable general language performance to DeepSeek-V2. Compared to DeepSeek-Coder, DeepSeek-Coder-V2 supports a significantly larger number of programming languages, increasing from 86 to 338, and extends ...

Appendix A Supported Programming Languages

(Appendix A Supported Programming Languag - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: ABAP, ActionScript, Ada, Agda, AGS Script, Alloy, AmbientTalk, AMD GPU, AMPL, ANSYS Parametric Design Language, ANTLR, Apache Configuration, APL, AppleScript, Arc, Arduino, ASP, AspectJ, Assembly, Asymptote, Augeas, AutoHotkey, AutoIt, AWK, BC, Berry, BitBake, BlitzBasic, BlitzMax, Bluespec, BNF, Boo, Boogie, Brainfuck, BrightScript, Bro, BST, C, C#, C2HS Haskell, CADL, CapDL, Ceylon, Chapel, ChucK, Cirru, Click, Clojure, CMake, COBOL, COBOLFree, CoffeeScript, ColdFusion CFC, Common Lisp, C++, Crystal, Csound, Csound Score, CSS, CUDA, Cypher, Cython, Darcs Patch, Dart, DASM16, Debian Control F...

Appendix A Supported Programming Languages

(Appendix A Supported Programming Languag - 详见原文) DeepSeek团队通过创新的架构设计和训练方法,在该领域取得了显著进展。模型在相关基准测试中表现出色,验证了这一方法的有效性。这一成果为开源AI社区做出了重要贡献,推动了技术发展。未来将继续优化和改进相关技术。
原文: l, Perl 6, PHP, Pike, PkgConfig, POD, Pony, POV-Ray, PowerShell, Praat, Processing, Propeller Spin, Protocol Buffer, Pug, Puppet, PureBasic, PureScript, Python, Q, QML, QVTO, R, Racket, Ragel in Ruby Host, RAML, RConsole, Rd, REALbasic, ReasonML, Red, RenderScript, Ren’Py, REXX, RHTML, Ride, Robot Framework, Rouge, Ruby, Rust, S, Sage, SARL, SAS, Sass, Scala, Scheme, Scilab, SCSS, Self, Shell, ShExC, Sieve, Silver, Singularity, Slim, Smali, Smarty, Smithy, SMT, Solidity, SourcePawn, SPARQL, SQF, SQL, Squirrel, Stan, Standard ML, Stata, Stylus, SuperCollider, Swift, SWIG, SystemVerilog, Tcl, Tc...
← 返回首页详细解读