C-046Value Alignment and AI EthicsConfidence: Medium

Micah Carroll, Matija Franklin & Hal Ashton, Beyond Preferences in AI Alignment

Tan Zhi-Xuan (2025)

Source link ↗Drill this reading Discuss with AI

One-Sentence Thesis

The authors challenge the dominant preferentist approach to AI alignment, which assumes that human preferences are an adequate representation of human values, and argue for a reframing of AI alignment to focus on normative standards and social roles.

Argument Outline

1Introduction to the preferentist approach to AI alignment
2Critique of rational choice theory as a descriptive model of human decision-making
3Critique of expected utility theory as a normative standard of rationality
4Alternative approaches to AI alignment, including alignment with normative standards and social roles

Key Distinctions

Preferentist approach vs. alternative approaches to AI alignment

Descriptive vs. normative accounts of human decision-making

Key Terms

Preferentist approach

An approach to AI alignment that assumes human preferences are an adequate representation of human values

Rational choice theory

A theoretical framework that assumes human behavior can be modeled as the maximization of expected utility

Expected utility theory

A normative standard of rationality that assumes rational agents maximize expected utility

Flashcards

17 cards