[原文]DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via
Reinforcement Learning for Subgoal Decomposition
Z.Z. Ren*, Zhihong Shao*, Junxiao Song*, Huajian Xin†, Haocheng Wang†, Wanjia Zhao†, Liyue Zhang, Zhe Fu
Qihao Zhu, Dejian Yang, Z.F. Wu, Zhibin Gou, Shirong Ma, Hongxuan Tang, Yuxuan Liu, Wenjun Gao
Daya Guo, Chong Ruan
DeepSeek-AI
https://github.com/deepseek-ai/DeepSeek-Prover-V2
Abstract
We introduce DeepSeek-Prover-V2, an open-source large language model designed for formal
theorem proving in Lean 4, with initialization data collected through a recursive theorem prov-
ing pipeli...
1. Introduction
1. 引言
大型语言模型(LLMs)推理能力的涌现已彻底变革了人工智能的众多领域,尤其在数学求解领域(DeepSeek-AI, 2025)。这些进展主要得益于推理时扩展范式,尤其是通过自然语言思维链推理(Jaech et al., 2024)。与仅依赖单次前向传播得出答案不同,LLMs 能够对中间推理步骤进行反思,从而同时提升准确性与可解释性。尽管自然语言推理...
[原文]The emergence of reasoning capabilities in large language models (LLMs) has revolutionized
numerous areas of artificial intelligence, particularly in the domain of mathematical problem
solving (DeepSeek-AI, 2025). These advancements are largely enabled by the paradigm of
inference-time scaling, most notably through natural language chain-of-thought reasoning
(Jaech et al., 2024). Rather than relying solely on a single forward pass to arrive at an answer,
LLMs can reflect on intermediate reasoning steps, improving both accuracy and interpretability.
Despite the success of natural language reaso...
1. Introduction
(该方法的)求解策略支持模块化、可重用性,并能实现更高效的证明搜索(Wang et al., 2024b; Zheng et al., 2024)。近期研究通过采用多层级结构进行结构化证明生成(Wang et al., 2024a),以及利用强化学习技术优化将复杂定理分解为可管理子目标的过程(Dong et al., 2024),进一步拓展了这一范式。 在本文中,我们开发了一种用于子目标分解的推理模型,利用一系列合成冷启动数据与大规模强化学习来提升其性能。为构建冷启动数据集,我们设计了一条简单而高效的递归定理证明流水线,采用 DeepSeek-V3(DeepSeek-AI, 2024)作为统一的子目标分解与形式化工具。我们提示 DeepSeek-V3 将定理分解为高层证明草图,同时将这些证明步骤形式化为 Lean 4 代码,从而生成一系列子目标。由于子目标分解由大型通用模型驱动,我们采用一个较小的 7B 参数模型来处理每个子目标的证明搜索,从而降低相关的计算负担。此外,我们引入了一种课程学习框架,利用分解出的子目标生成猜想定理,逐步增加训练任务的难度,以更好地引导模型的学习过程。一旦某个难题的分解步骤得到解决,我们将完整的逐步形式化证明与 DeepSeek-V3 对应的思维链配对,构建冷启动推理数据。基于冷启动数据,后续引入强化学习阶段以进一步增强模型能力。 我们需要证明:对于任意整数 n \ge 4,不等式 n^2 \le n! 成立。其中,n! 表示 n 的阶乘,即所有不超过 n 的正整数的乘积。 为了形式化地证明该命题,我们可以使用数学归纳法。具体归纳过程如下:
[原文]solving strategies and supports modularity, reusability, and more efficient proof search (Wang
et al., 2024b; Zheng et al., 2024). Recent studies have extended this paradigm by employing
multi-tiered hierarchies for structured proof generation (Wang et al., 2024a), and by leveraging
reinforcement learning techniques to optimize the decomposition of complex theorems into
manageable subgoals (Dong et al., 2024).
In this paper, we develop a reasoning model for subgoal decomposition, leveraging a suite of
synthetic cold-start data and large-scale reinforcement learning to enhance its performance. ...
2. Inductive Step: Assume the inequality holds for some �= 4, i.e., �2 ≤�!.
2. 归纳步骤:假设不等式对某个 $n=4$ 成立,即 $n^2 \leq n!$。
然后证明它对 n+1 也成立,即 (n+1)^2 \leq (n+1)!。 Lean 4 证明结构: theorem induction_ineq_nsqlefactn (n : ℕ) (h₀ : 4 ≤ n) : n ^ 2 ≤ n ! := by have base_case : 4 ^ 2 ≤ 4 ! := by simp [Nat.factorial] have inductive_step : ∀ k ≥ 4, k ^ 2 ≤ k ! → (k + 1) ^ 2 ≤ (k + 1) ! := by intro k h₁ h₂; simp_all [Nat.factorial]; nlinarith have final_proof : ∀ n ≥ 4, n ^ 2 ≤ n ! := by intro n hn; induction' hn with k hk case refl => exact base_case case step => apply inductive_step k hk; exact by assumption apply final_proof; exact h₀ 证明:对任意整数 n \geq 4,有 n^2 \leq n!。 theorem induction_ine
[原文]Then show that it holds for �+ 1, i.e., (�+ 1)2 ≤(�+ 1)!.
Lean 4 Proof Structure
theorem induction_ineq_nsqlefactn (n : ℕ) (h₀ : 4 ≤ n) : n ^ 2 ≤ n ! := by
have base_case : 4 ^ 2 ≤ 4 ! := by
simp [Nat.factorial]
have inductive_step : ∀ k ≥ 4, k ^ 2 ≤ k ! → (k + 1) ^ 2 ≤ (k + 1) ! := by
intro k h₁ h₂
simp_all [Nat.factorial]
nlinarith
have final_proof : ∀ n ≥ 4, n ^ 2 ≤ n ! := by
intro n hn
induction' hn with k hk
case refl => exact base_case
case step =>
apply inductive_step k hk
exact by assumption
apply final_proof
exact h₀
Show that for any integer �≥4, we have �2 ≤�!.
theorem induction_ine...
2. Inductive Step: Assume the inequality holds for some �= 4, i.e., �2 ≤�!.
[原文]contribute ProverBench, a benchmark dataset containing 325 formalized problems to advance
neural theorem proving research, including 15 from the prestigious AIME competitions (years
24-25). DeepSeek-Prover-V2-671B successfully solves 6 of these 15 challenging AIME problems,
further demonstrating its sophisticated mathematical reasoning capabilities.
[原文]2.1. Recursive Proof Search via Subgoal Decomposition
Decomposing the proof of a complex theorem into a sequence of smaller lemmas that serve as
stepping stones is an effective strategy commonly employed by human mathematicians. Several
previous studies have explored this hierarchical strategy in the context of neural theorem prov-
ing, aiming to enhance proof search efficiency by leveraging the informal reasoning capabilities
of modern LLMs (Jiang et al., 2023; Zhao et al., 2023; Wang et al., 2024a; Dong et al., 2024). In
this paper, we develop a simple yet effective pipeline that utilizes De...
[原文]guage models have led to significant breakthroughs in informal reasoning capabilities. To bridge
the gap between formal and informal reasoning, we leverage cutting-edge general-purpose
LLMs, recognized for their strong mathematical reasoning and instruction-following abilities, to
construct the foundational framework of our theorem-proving system. Our findings indicate
that off-the-shelf models, such as DeepSeek-V3 (DeepSeek-AI, 2024), are capable of decompos-
ing proof steps and expressing them in formal languages. To prove a given formal theorem
statement, we prompt DeepSeek-V3 to first anal...
2. Method
人类撰写文本的形式化提供了高质量且多样化的形式化内容,但证明模型获得的训练信号往往较为稀疏,因为大量计算尝试未能产生成功证明,因此无法提供正向奖励信号。为生成更密集的训练信号,Dong and Ma (2025) 提出了一种自博弈方法,通过生成与原始定理陈述密切相关的可处理猜想来丰富训练问题集,从而更高效地利用训练计算资源。在本文中,我们实现了一种 straightforward a
[原文]formalization of human-authored texts provides high-quality and diverse formal content, the
resulting training signals for prover models are often sparse, as a large proportion of computa-
4
tional attempts do not yield successful proofs and therefore offer no positive reward signals.
To generate denser training signals, Dong and Ma (2025) proposed a self-play approach that
enriches training problem sets by generating tractable conjectures closely related to the original
theorem statements, thereby enabling more efficient use of training compute. In this paper, we
implement a straightforward a...
2. Method
数百条高质量的合成冷启动数据,这些数据构成了训练 DeepSeek-Prover-V2 的基础。该冷启动数据集生成策略与同期形式化推理模型研究 Kimina-Prover(Wang et al., 2025)的策略有所不同。具体而言,我们通过直接将自然语言证明形式化为结构化的形式证明草图来合成数据。相比之下,Kimina-Prover 采用了一种反向工作流:它首先收集完整的形式化证明及其对应的非形式化版本,随后利用通用推理模型逆向合成中间自然...
[原文]hundreds of high-quality synthetic cold-start data, which serve as the foundation for training
DeepSeek-Prover-V2. This cold-start dataset generation strategy differs from that of Kimina-
Prover (Wang et al., 2025), a concurrent work on formal reasoning models. Specifically, we
synthesize data by formalizing natural-language proofs directly into structured formal proof
sketches. In contrast, Kimina-Prover adopts a reverse workflow: it begins by collecting complete
formal proofs alongside their informal counterparts, then uses general-purpose reasoning
models to retrosynthesize intermediate nat...
1. High-efficiency non-Chain-of-Thought (non-CoT) mode: This mode is optimized for
Draft: 快速生成 Lean 形式化证明代码,侧重于生成无需显式中间推理步骤的简洁证明。 Final decision: 快速生成 Lean 形式化证明代码,侧重于生成无需显式中间推理步骤的简洁证明。 (Matches academic tone precisely)✅ "explicit intermediate reasoning steps" -> 在形式化方法中,"explicit" 常译为“显式”。 Final: 快速生成 Lean 形式化证明代码,侧重于生成无需显式中间推理步骤的简洁证明。 “快速生成 Lean 形式化证明代码,侧重于生成无需显式中间推理步骤的简洁证明。” (If it's a standalone phrase, this is perfect. I'll provide it directly.) Translation: 快速生成 Lean 形式化证明代码,侧重于生成无需显式中间推理步骤的简洁证明。
[原文]the rapid generation of formal Lean proof codes, focusing on producing concise proofs
without explicit intermediate reasoning steps.
2. High-precision Chain-of-Thought (CoT) mode: This mode systematically articulates
[原文]intermediate reasoning steps, emphasizing transparency and logical progression, before
constructing the final formal proofs.
Consistent with DeepSeek-Prover-V1.5 (Xin et al., 2024b), these two generation modes are
governed by two distinct guiding prompts (see Appendix A for examples). In the first stage,
we employ expert iteration within a curriculum learning framework to train a non-CoT prover
model, meanwhile, synthesizing proofs for hard problems through subgoal-based recursive
proving. The non-CoT generation mode is chosen to accelerate iterative training and data
collection processes, as ...
2. High-precision Chain-of-Thought (CoT) mode: This mode systematically articulates
[原文](DeepSeek-AI, 2024) using a constant learning rate of 5e-6 within a context window of 16,384
tokens. Our training corpus consists of two complementary sources: (1) non-CoT data collected
through expert iteration, which produces Lean codes without intermediate reasoning steps;
and (2) the cold-start CoT data described in Section 2.2, which distills DeepSeek-V3’s advanced
mathematical reasoning processes into structured proving pathways. The non-CoT components
emphasize formal verification skills in the Lean theorem prover ecosystem, while the CoT
examples explicitly model the cognitive process ...
3. Experimental Results
在多个形式化定理证明基准数据集上进行了系统评估,涵盖高中竞赛题与本科水平数学题。等,2024b)保持一致。除非另有说明,基线模型的评估结果均引自其原始论文。和 IMO 竞赛,以及从 MATH 数据集(Hendrycks 等,2021)中精选的题目。数论和数学归纳法。每个子集包含 244 道题,且在各个学科领域的分布完全一致。题目则被纳入结合子目标分解的课程学习中。进行了一处修订(详见附录 D)。与当前最优(SoTA)模型的对比。数据集上评估的当前最优形式化定理证明模型的对比情况。准确率。
[原文]In this section, we present a systematic evaluation of DeepSeek-Prover-V2 across diverse bench-
mark datasets of formal theorem proving, covering both high school competition problems
and undergraduate-level mathematics. All experimental results of DeepSeek-Prover-V2 are
conducted with Lean 4.9.0-rc2, using the same testing environment as DeepSeek-Prover-V1.5
(Xin et al., 2024b). Without further specification, baseline evaluation results are sourced from
their respective original papers.
3.1. Results on MiniF2F Benchmark
MiniF2F (Zheng et al., 2022) consists of 488 formalized problem statement...
3. Experimental Results
将具有挑战性的问题分解为一系列易于处理的步骤,在非形式化推理与形式化证明构建之间架起了一座有效的桥梁。 表3 | DeepSeek-Prover-V2在miniF2F-test上生成的平均token数。CoT与非CoT对比。 表1的实验结果表明,在形式化数学推理中,CoT(思维链)推理模式相较于非CoT模式具有显著的性能优势。这进一步印证了CoT提示的有效性,即鼓励将复杂问题分解为中间步骤,同时也再次证实了推理时扩展(inference-time scaling)在形式化定理证明领域依然成立。作为对这些发现的补充,表3提供了DeepSeek-Prover-V2在不同推理模式下生成token数量的统计结果。正如预期,CoT模式生成的输出显著更长,反映了其复杂的推理过程。有趣的是,在非CoT设置下,671B模型平均生成的输出长度比7B模型更长。深入分析发现,尽管非CoT模式未显式提示推理过程,但更大规模的模型经常在证明代码中插入简短的自然语言注释,这些注释类似于隐式的推理步骤(见附录A)。这表明,即使在没有显式CoT提示的情况下,高容量模型也可能将中间推理过程内化并以隐式方式外显。 ProofNet(Azerbayev 等,2023)包含371个Lean 3问题,选自多种流行的本科纯数学教材,涵盖实变函数与复变函数分析、线性代数、抽象代数和拓扑学等主题。我们使用了Xin 等(2024b)提供的ProofNet的Lean 4翻译版本,该版本进一步划分为两个子集:ProofNet-valid和ProofNet-test,分别包含185和186个问题。ProofNet的测试集专用于模型评估,因为ProofNet-valid问题的变体已包含在Dong和Ma(2025)提供的公开合成数据集中,该数据集用于我们的监督微调。如表4所示的结果表明,与不使用CoT的设置相比,使用CoT推理时DeepSeek-Prover-V2的通过率得到了显著提升。值得注意的是,尽管训练数据主要来源于高中…… "challenging problems into a sequence of tractable steps, serving as an effective bridge between informal reasoning and formal proof construction." -> 将具有挑战性的问题分解为一系列易于处理的步骤,在非形式化推理与形式化证明构建之间架起了一座有效的桥梁。 表3 | DeepSeek-Prover-V2在miniF2F-test上生成的平均token数。CoT与非CoT对比。 "The experimental results in Table 1 demonstrate a substantial performance advantage of the CoT reasoning mode over the non-CoT mode in formal mathematical reasoning." -> 表1的实验结果表明,在形式化数学推理中,CoT(思维链)推理模式相较于非CoT模式具有显著的性能优势。 "This reinforces the effectiveness of CoT prompting, which encourages decomposition of complex problems into intermediate steps, and further confirms that inference-time scaling holds in the domain of formal theorem proving." -> 这进一步印证了CoT提示的有效性,即鼓励将复杂问题分解为中间步骤,同时也再次证实了推理时扩展(inference-time scaling)在形式化定理证明领域依然成立。 "Complementing these findings, Table 3 provides statistics on the number of tokens generated by DeepSeek-Prover-V2 under different reasoning modes." -> 作为对这些发现的补充,表3提供了DeepSeek-Prover-V2在不同推理模式下生成token数量的统计结果。
[原文]challenging problems into a sequence of tractable steps, serving as an effective bridge between
informal reasoning and formal proof construction.
#output tokens
non-CoT
CoT
7B
442.6
4488.5
671B
761.8
6751.9
Table 3 | Average number of tokens generated
by DeepSeek-Prover-V2 on miniF2F-test.
CoT vs. non-CoT.
The experimental results
in Table 1 demonstrate a substantial perfor-
mance advantage of the CoT reasoning mode
over the non-CoT mode in formal mathemat-
ical reasoning. This reinforces the effectiveness
of CoT prompting, which encourages decom-
position of complex problems into intermedi-
a...
[原文]certain corner cases. Upon closer examination of the model’s outputs, we identified a distinctive
pattern in its reasoning approach: the 7B model frequently employs Cardinal.toNat and
Cardinal.natCast_inj to exploit this user-interface bug (see examples in Appendix B), which
are noticeably absent in the outputs generated by the 671B version.
10
3.3. Results on Combinatorial Problems
CombiBench
Pass@16
Kimina-Prover-Preview (Wang et al., 2025)
7/100
DeepSeek-Prover-V2-7B
non-CoT
6/100
CoT
7/100
DeepSeek-Prover-V2-671B
non-CoT
8/100
CoT
10/100
Table 5 | Evaluation results on CombiBench
under the...
[原文]and covering a wide range of domains
such as algebra, applied mathematics,
calculus, number theory, and discrete
mathematics. FormalMATH-Lite is a
manageable subset of 425 problems
(comprising 359 high school-level and
66 undergraduate-level problems) care-
fully selected from the full corpus, addressing the impracticality of evaluating on the full dataset
while still enabling systematic assessment of test-time scaling across various mathematical do-
main. Table 6 presents the comparative performance analysis of various theorem provers on both
FormalMATH-All and FormalMATH-Lite benchmarks. Dee...
[原文]enable more comprehensive evaluation across both high-school competition problems and
undergraduate-level mathematics.
Contest
Problems
AIME 24I
P2 , P7 , P13
AIME 24II
P4 , P7, P13 , P14
AIME 25I
P1 , P8 , P9, P11
AIME 25II
P2 , P4 , P13, P15
Table 8 | Selection of AIME 24&25 problems for
formalization. Problems with underlined bolded
indices have been solved by DeepSeek-Prover-
V2. Problems solved by DeepSeek-V3-0324 using
Maj@16 are highlighted with a gray background.
AIME Formalization.
The American In-
vitational Mathematics Examination (AIME)
is an annual mathematics competition de-
sign...
[原文]sures comprehensive representation across dif-
ficulty levels and topic areas. As a result, we
formalize 310 problems that encompass a broad
spectrum, ranging from elementary mathematics
at the competition level to advanced topics typi-
cally encountered in undergraduate studies. This
comprehensive benchmark covers number the-
ory, elementary algebra, linear algebra, abstract
algebra, calculus, real analysis, complex analysis,
functional analysis, and probability. The deliber-
ate inclusion of this diverse array of mathematical
fields allows for a thorough assessment of model
capabilities acro...
[原文]In this work, we propose a comprehensive pipeline for synthesizing cold-start reasoning data
to advance formal theorem proving. Our data construction process is grounded in a recursive
theorem-proving framework, wherein DeepSeek-V3 serves as a unified model for both subgoal
decomposition and lemma formalization within the Lean 4 proof assistant. Our approach com-
bines high-level proof sketches with formal steps, creating a sequence of manageable subgoals
that can be efficiently solved using a smaller 7B model, significantly reducing computational re-
quirements. The curriculum learning framew...
2025. URL https://arxiv.org/abs/2501.12948.
参考文献: K. Dong and T. Ma. STP: Self-play LLM theorem provers with iterative conjecturing and proving. arXiv preprint arXiv:2502.00212, 2025. K. Dong, A. Mahankali, and T. Ma. Formal theorem proving by rewarding LLMs to decompose proofs hierarchically. arXiv preprint arXiv:2411.01829, 2024. M. Eppe, C. Gumbsch, M. Kerzel, P. D. Nguyen, M. V. Butz, and S. Wermter. Intelligent problem-solving as integrated hierarchical reinforcement learning. Nature Machine Intelligence, 4(1): 11–20, 2022. D. Hendrycks, C. Burns, S. Kadavath, A. Arora, S. Basart, E. Tang, D. Song, and J. Steinhardt. Measuring mathemati
[原文]K. Dong and T. Ma. STP: Self-play llm theorem provers with iterative conjecturing and proving.
arXiv preprint arXiv:2502.00212, 2025.
K. Dong, A. Mahankali, and T. Ma. Formal theorem proving by rewarding llms to decompose
proofs hierarchically. arXiv preprint arXiv:2411.01829, 2024.
M. Eppe, C. Gumbsch, M. Kerzel, P. D. Nguyen, M. V. Butz, and S. Wermter. Intelligent problem-
solving as integrated hierarchical reinforcement learning. Nature Machine Intelligence, 4(1):
11–20, 2022.
D. Hendrycks, C. Burns, S. Kadavath, A. Arora, S. Basart, E. Tang, D. Song, and J. Steinhardt.
Measuring mathemati...
2025. URL https://arxiv.org/abs/2501.12948.
L. C. Paulson. Isabelle: 一个通用定理证明器。Springer Verlag, 1994年。 S. Polu 和 I. Sutskever. 生成式语言模型在形式化证明中的应用。这些都是形式化定理证明领域的重要基础工作和相关研究成果,为DeepSeek-Prover-V2的方法设计提供了理论支撑和技术参考。
[原文]L. C. Paulson. Isabelle a Generic Theorem Prover. Springer Verlag, 1994.
S. Polu and I. Sutskever. Generative language modeling for automated theorem proving. arXiv
preprint arXiv:2009.03393, 2020.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization
algorithms. arXiv preprint arXiv:1707.06347, 2017.
Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, M. Zhang, Y. Li, Y. Wu, and D. Guo. DeepSeekMath:
Pushing the limits of mathematical reasoning in open language models. arXiv preprint
arXiv:2402.03300, 2024.
G. Tsoukalas, J. Lee, J. Jennings, J. Xin, M. Ding, M. Je...
2025. URL https://arxiv.org/abs/2501.12948.
参考文献(续): reasoning: A new frontier in AI. arXiv preprint arXiv:2412.16075, 2024. H. Ying, Z. Wu, Y. Geng, J. Wang, D. Lin, and K. Chen. Lean workbook: A large-scale Lean problem set formalized from natural language math problems. In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024. Z. Yu, R. Peng, K. Ding, Y. Li, Z. Peng, M. Liu, Y. Zhang, Z. Yuan, H. Xin, W. Huang, et al. FormalMath: Benchmarking formal mathematical reasoning of large language models. arXiv preprint arXiv:2505.02735, 2025. J. Zhang, Q. Wang, X. Ji, Y. Liu, Y. Yue, F. Zhang, D. Zh
[原文]reasoning: A new frontier in AI. arXiv preprint arXiv:2412.16075, 2024.
H. Ying, Z. Wu, Y. Geng, J. Wang, D. Lin, and K. Chen. Lean workbook: A large-scale lean
problem set formalized from natural language math problems. In The Thirty-eight Conference
on Neural Information Processing Systems Datasets and Benchmarks Track, 2024.
Z. Yu, R. Peng, K. Ding, Y. Li, Z. Peng, M. Liu, Y. Zhang, Z. Yuan, H. Xin, W. Huang, et al.
Formalmath: Benchmarking formal mathematical reasoning of large language models. arXiv
preprint arXiv:2505.02735, 2025.
J. Zhang, Q. Wang, X. Ji, Y. Liu, Y. Yue, F. Zhang, D. Zh...
2025. URL https://arxiv.org/abs/2501.12948.
Lean 4 证明片段(三角恒等式化简): -- 使用特定角度的余弦和恒等式化简表达式 have h5 : Real.cos (2 * Real.pi / 7) = Real.cos (2 * (Real.pi / 7)) := by ring have h6 : Real.cos (3 * Real.pi / 7) = Real.cos (3 * (Real.pi / 7)) := by ring rw [h5, h6] have h7 : Real.cos (2 * (Real.pi / 7)) = 2 * Real.cos (Real.pi / 7) ^ 2 −1 := by have h71 : Real.cos (2 * (Real.pi / 7)) = 2 * Real.cos (Real.pi / 7) ^ 2 −1 := by rw [Real.cos_two_mul]; ring_nf; nlinarith [Real.cos_sq_add_sin_sq (Real.pi / 7)] rw [h71] have h8 : Real.cos (3 * (Real.pi / 7)) = 4 * Real.cos (Real.pi / 7) ^ 3 −3 * Real
[原文]−−Use the identity for the sum of cosines with specific angles to simplify the expression
have h5 : Real.cos (2 * Real.pi / 7) = Real.cos (2 * (Real.pi / 7)) := by ring
have h6 : Real.cos (3 * Real.pi / 7) = Real.cos (3 * (Real.pi / 7)) := by ring
rw [h5, h6]
have h7 : Real.cos (2 * (Real.pi / 7)) = 2 * Real.cos (Real.pi / 7) ^ 2 −1 := by
have h71 : Real.cos (2 * (Real.pi / 7)) = 2 * Real.cos (Real.pi / 7) ^ 2 −1 := by
rw [Real.cos_two_mul]
ring_nf
nlinarith [Real.cos_sq_add_sin_sq (Real.pi / 7)]
rw [h71]
have h8 : Real.cos (3 * (Real.pi / 7)) = 4 * Real.cos (Real.pi / 7) ^ 3 −3 * Real...
[原文]nlinarith [Real.cos_le_one (Real.pi / 7), Real.cos_le_one (2 * (Real.pi / 7)), Real.cos_le_one (3 *
(Real.pi / 7))]
−−Use the polynomial equation to find the sum of cosines
linarith
‘‘‘
A.2. CoT Generation Strategy of DeepSeek-Prover-V2
Input:
Complete the following Lean 4 code:
‘‘‘lean4
import Mathlib
import Aesop
set_option maxHeartbeats 0
open BigOperators Real Nat Topology Rat
/-- Let 𝑝, 𝑞, 𝑟be integers with 1
Lean 4 证明片段(IMO 1992 P1 分类讨论): have h20 : (q : Z) ⩾4 := by by_contra h20; have h21 : q ⩽3 := by linarith; have h22 : q = 3 := by linarith have h23 : (p : Z) = 3 := by norm_num [h17]; have h24 : (q : Z) = 3 := by norm_num [h22] have h25 : (r : Z) ⩾4 := by linarith norm_num [h17, h22, h23, h24] at h18 (try omega) (try nlinarith) have h21 : (r : Z) ⩾q + 1 := by linarith have h22 : (q : Z) ⩾4 := by linarith have h23 : (p : Z) = 3 := by norm_num [h17] have h24 : (p : Z) * q * r −1 = 2 * ((p −1) * (q −1) * (r −1)) := by linarith have h25 : 3 * q * r ...
[原文]have h20 : (q : Z) ⩾4 := by
by_contra h20
have h21 : q ⩽3 := by linarith
have h22 : q = 3 := by linarith
have h23 : (p : Z) = 3 := by norm_num [h17]
have h24 : (q : Z) = 3 := by norm_num [h22]
have h25 : (r : Z) ⩾4 := by linarith
norm_num [h17, h22, h23, h24] at h18
(try omega)
(try nlinarith)
(try
{
nlinarith [mul_pos (sub_pos.mpr h0.2.1) (sub_pos.mpr h0.2.2)]
})
have h21 : (r : Z) ⩾q + 1 := by linarith
have h22 : (q : Z) ⩾4 := by linarith
have h23 : (p : Z) = 3 := by norm_num [h17]
have h24 : (p : Z) * q * r −1 = 2 * ((p −1) * (q −1) * (r −1)) := by linarith
have h25 : 3 * q * r ...
2025. URL https://arxiv.org/abs/2501.12948.
Lean 4 证明片段(整除性分析): -- 利用 (q −3) 的绝对值整除 5 的绝对值这一事实 have h37 : ((q : Z) −3).natAbs ∣ 5 := by simpa [Int.natAbs] using h34 -- 由于 5 的因子只有 1 和 5,我们分情况讨论 have h38 : ((q : Z) −3).natAbs = 1 ∨ ((q : Z) −3).natAbs = 5 := by have h39 : ((q : Z) −3).natAbs ∣ 5 := h37 have h40 : ((q : Z) −3).natAbs ⩽5 := Nat.le_of_dvd (by decide) h39 interval_cases ((q : Z) −3).natAbs norm_num at h39 ⊢ omega cases h38 with | inl h38 => have h41 : (q : Z) −3 = 1 ∨ (q : Z) −3 = −1 := by have h42 : ((q : Z) −3).natAbs = 1 := h38 have h43 : (q : Z) ...
[原文]−−We use the fact that the absolute value of (q −3) divides the absolute value of 5
have h37 : ((q : Z) −3).natAbs | 5 := by
simpa [Int.natAbs] using h34
−−Since the possible divisors of 5 are 1 and 5, we check the cases
have h38 : ((q : Z) −3).natAbs = 1 ∨((q : Z) −3).natAbs = 5 := by
have h39 : ((q : Z) −3).natAbs | 5 := h37
have h40 : ((q : Z) −3).natAbs ⩽5 := Nat.le_of_dvd (by decide) h39
interval_cases ((q : Z) −3).natAbs norm_num at h39 ⊢ omega
cases h38 with
| inl h38 =>
have h41 : (q : Z) −3 = 1 ∨(q : Z) −3 = −1 := by
have h42 : ((q : Z) −3).natAbs = 1 := h38
have h43 : (q : Z) ...
[原文]have h2 : ({ (P, Q) : (Polynomial R) × (Polynomial R) | P ^ 2 + Q ^ 2 = Polynomial.X ^ (2 * n) + 1 ∧P.degree
> Q.degree }.ncard : N) = 2 ^ (n + 1) := by
apply h1
exact npos
simpa [h2] using h2
C. An Example of Using exfalso to Prove a Vacuously True Statement
The following example is taken from an earlier version of CombiBench (Liu et al., 2025), which
has been corrected in the latest release.
Input:
Complete the following Lean 4 code:
34
‘‘‘lean4
import Mathlib
def appears (W : Z →Fin 2) (U : Σ n, Fin n →Fin 2) : Prop :=
∃k, ∀i : Fin U.1, U.2 i = W (k + i)
def ubiquitous (W : Z →Fin 2) (U : Σ...
1. We have an infinite periodic word ‘W : Z →Fin 2‘ with minimal period ‘N > 2𝑛‘ (where ‘n : N+‘ is a
[原文]positive integer).
−‘Function.Periodic W N‘ means ‘∀z, W (z + N) = W z‘.
−The minimality condition is ‘∀N’
2. The definitions:
2. 定义:
— 有限词 U = (n, f : \mathrm{Fin}\,n \to \mathrm{Fin}\,2) **出现**于 W 中,若存在 k 使得对所有 i : \mathrm{Fin}\,n 有 U.2\,i = W(k + i)。 — 在 Lean 中,\mathrm{appears}\,W\,U 定义为 \exists k, \forall i : \mathrm{Fin}\,U.1, U.2\,i = W(k + i)。 — U 是**普遍**的(ubiquitous),若四个增广词 Ua、Ub、aU、bU 均出现于 W 中。 — 在 Lean 中,\mathrm{ubiquitous}\,W\,U 检查四个增广词是否出现,其中: — Ua 为 \langle U.1 + 1, \mathrm{Fin.snoc}\,U.2\,0\rangle, — Ub 为 \langle U.1 + 1, \mathrm{Fin.snoc}\,U.2\,1\rangle, — aU 为 \langle U.1 + 1, \mathrm{Fin.cons}\,0\,U.2\rangle, — bU 为 \langle U.1 + 1, \mathrm{Fin.cons}\,1\,U.2\rangle。
[原文]−A finite word ‘U = (n, f : Fin n →Fin 2)‘ **appears** in ‘W‘ if there exists ‘k‘ such that ‘U.2 i = W (k +
i)‘ for all ‘i : Fin n‘.
−In Lean, ‘appears W U‘ is defined as ‘∃k, ∀i : Fin U.1, U.2 i = W (k + i)‘.
−‘U‘ is **ubiquitous** if all four augmented words ‘Ua‘, ‘Ub‘, ‘aU‘, ‘bU‘ appear in ‘W‘.
−In Lean, ‘ubiquitous W U‘ checks that the four augmented words appear, where:
−‘Ua‘ is ‘⟨U.1 + 1, Fin.snoc U.2 0⟩‘,
−‘Ub‘ is ‘⟨U.1 + 1, Fin.snoc U.2 1⟩‘,
−‘aU‘ is ‘⟨U.1 + 1, Fin.cons 0 U.2⟩‘,
−‘bU‘ is ‘⟨U.1 + 1, Fin.cons 1 U.2⟩‘.
3. The goal is to find an injective function from ‘Fin n‘ to ubiquitous finite nonempty words in ‘W‘
标题:3. 目标是寻找一个从 ‘Fin n‘ 到 ‘W‘ 中普遍存在的有限非空词的单射函数): (即,到 ‘Σ k, Fin k →Fin 2‘,其中 ‘k ≠0‘ 且该词是普遍存在的)。 然而,Lean 中的问题略有不同:
[原文](i.e., to ‘Σ k, Fin k →Fin 2‘ where ‘k ≠0‘ and the word is ubiquitous).
However, the Lean problem is slightly different:
[原文]**Wait**: In Lean, ‘Function.Periodic f c‘ is ‘∀z, f (z + c) = f z‘ (‘c‘ is the period, not necessarily
minimal). Here, ‘hW’‘ takes ‘N’‘ as a ‘N‘, but ‘W : Z →Fin 2‘ uses ‘+‘ as the ‘Z‘ addition, and ‘N’‘ is
cast to ‘Z‘ (since ‘c : Z‘ in ‘f (z + c)‘).
But still:
−‘hW’ 0 . . .‘ is not directly applicable because ‘0 : N‘ is ‘ 2 ^ n.1 ⩾1‘), but
‘Function.Periodic W 0‘ is ‘∀z : Z, W (z + 0) = W z‘, which is trivially true (‘z + 0 = z‘). But ‘hW’ 0
(by linarith)‘ states ‘¬Function.Periodic W 0‘, which is false.
**Conclusion**: The assumptions in the Lean problem are contradictory becau...
**结论**:Lean 问题中的假设是矛盾的,因为
3. Thus, ‘hW’ 0‘ yields ‘false‘.
📝 暂未翻译 — But Lean’s ‘hW’‘ is ‘∀(N’ : N), N’
2. The conditions ‘(x i).1 ≠0‘ (‘1 ≠0‘) and ‘ubiquitous W (x i)‘ (which will follow from ‘false‘).
[原文]But to actually prove ‘ubiquitous W (x i)‘, we need to prove ‘appears W ⟨2, _⟩‘, etc., but we can derive
‘false‘ from ‘hW’ 0‘.
However, ‘hN : 2 ^ n.1 2 ^ n.1 ⩾2‘ (if ‘n.1 ⩾1‘ then ‘2 ^ n.1 ⩾2‘).
But ‘N‘ is ‘N : N‘, and ‘0 2 ^ n.1 ⩾1‘), so ‘N’ = 0
5. But ‘Function.Periodic W 0‘ is trivially true, so ‘hW’ 0 . . .‘ is ‘false‘.
注意 2^{n.1} 在 n.1 = 0 时不一定 \geq 2,但 n : \mathbb{N}^+ 意味着 n.1 \geq 1(n \neq 0)。因此, 2^{n.1} \geq 2^1 = 2(n.1 \geq 1)。 然而,若 N : \mathbb{N} 满足 N > 2^{n.1},则 N \geq 1(当 n.1 = 1 且 2^{n.1} = 2 > N 时 N 可能为 1,但由 hN 这不可能)。不,在 hN 中,2^{n.1} \geq 2^1 = 2,所以 N \geq 3。 n : \mathbb{N}^+ 为正整数(n.1 \geq 1),因此: — n.1 \geq 1 意味着 2^{n.1} \geq 2^1 = 2,所以 N > 2 且 N \geq 3。 因此,$0
[原文]But note that ‘2 ^ n.1‘ is not necessarily ‘⩾2‘ if ‘n.1 = 0‘, but ‘n : N+‘ implies ‘n.1 ⩾1‘ (‘n ≠0‘). Thus, ‘2
^ n.1 ⩾2 ^ 1 = 2‘ (‘n.1 ⩾1‘).
However, if ‘N : N‘ is such that ‘N > 2 ^ n.1‘, we have ‘N ⩾1‘ (‘N‘ could be ‘1‘ if ‘n.1 = 1‘ and ‘2 ^ n.1 = 2 > N‘
is not possible by ‘hN‘). Wait no: in ‘hN‘, ‘2 ^ n.1 2 ^ 1 = 2‘, so ‘N ⩾3‘.
Wait, ‘n : N+‘ is a positive integer (‘n.1 ⩾1‘), so:
−‘n.1 ⩾1‘ implies ‘2 ^ n.1 ⩾2 ^ 1 = 2‘, so ‘N > 2‘ and ‘N ⩾3‘.
Thus, ‘0
1. Derive ‘False‘ from ‘hW’‘:
1. 从 $hW'$ 导出 $\mathrm{False}$:
37 — 由于 N > 2^{n.1} \geq 2,故 N \geq 3。 — hW'\,0 要求 $0
[原文]37
−Since ‘N > 2 ^ n.1 ⩾2‘, ‘N ⩾3‘.
−‘hW’ 0‘ requires ‘0
3. The conditions on ‘x‘ are vacuously true (‘False →anything‘).