← 首页 | 厂商论文 | 导读
OpenAI

Learning to Reason with LLMs

学习使用大语言模型进行推理

📅 2024-09-12👤 OpenAI📄 arXiv: 2501.12948
推理强化学习思维链数学编程

中文摘要

o1 是大语言模型推理能力的突破。通过在训练过程中强化学习推理链,模型在数学、科学和编程等复杂推理任务上实现了重大性能提升。o1系列模型采用大规模推理训练策略,在AIME、MATH、GPQA等基准测试上达到新的最先进水平。

o1 is a breakthrough in reasoning capabilities of large language models. Through reinforcement learning on reasoning chains during training, the model achieves significant performance improvements on complex reasoning tasks in mathematics, science, and programming.

快速链接

← 厂商论文列表首页