← 首页 | 厂商论文 | 导读
Anthropic

Constitutional AI: Harmlessness from AI Feedback

宪法AI:从AI反馈实现无害性

📅 2022-12-15👤 Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, 📄 arXiv: 2212.08073
AI安全对齐无害性宪法AIRLHF

中文摘要

宪法AI是一种不依赖人类反馈就能训练无害AI模型的方法。该方法使用一组原则(宪法)指导AI生成自我批评和修订,从而减少对有害输出的倾向。这种方法比传统的RLHF更高效、成本更低,同时能更好地保证模型的安全性和无害性。

Constitutional AI is a method for training harmless AI models without relying on human feedback for harmlessness labels. This approach uses a set of principles (a constitution) to guide AI in generating self-critiques and revisions, reducing harmful outputs more efficiently than traditional RLHF.

快速链接

← 厂商论文列表首页