宪法AI:从AI反馈实现无害性
宪法AI是一种不依赖人类反馈就能训练无害AI模型的方法。该方法使用一组原则(宪法)指导AI生成自我批评和修订,从而减少对有害输出的倾向。这种方法比传统的RLHF更高效、成本更低,同时能更好地保证模型的安全性和无害性。
Constitutional AI is a method for training harmless AI models without relying on human feedback for harmlessness labels. This approach uses a set of principles (a constitution) to guide AI in generating self-critiques and revisions, reducing harmful outputs more efficiently than traditional RLHF.