OpenAI

Learning to Reason with LLMs

学习使用大语言模型进行推理

📅 2024-09-12👤 OpenAI📄 arXiv: 2501.12948

推理强化学习思维链数学编程

中文摘要

o1 是大语言模型推理能力的突破。通过在训练过程中强化学习推理链，模型在数学、科学和编程等复杂推理任务上实现了重大性能提升。o1系列模型采用大规模推理训练策略，在AIME、MATH、GPQA等基准测试上达到新的最先进水平。

o1 is a breakthrough in reasoning capabilities of large language models. Through reinforcement learning on reasoning chains during training, the model achieves significant performance improvements on complex reasoning tasks in mathematics, science, and programming.

快速链接

PDF 下载 arXiv 原文网页查看全文

📄 PDF 原文预览

中文翻译进度 37 / 132 段 (28%)

← 厂商论文列表首页