Epistemic Virtue Evaluations

AI systems should help people believe true things for the right reasons. We call this ‘epistemic virtue’: presenting clear, calibrated answers without ulterior motives and supporting, rather than undermining, users’ ability to make sense of the world. 

Epistemic virtue evaluations are a way to measure the extent to which AI systems meet this standard. Making high-quality evals readily available would help developers train virtuous systems, help consumers and regulators track performance, and create accountability pressures on developers who fall short.


Why this matters

Will the impact of widespread AI use be positive or negative for society’s ability to handle new challenges? Some of this could come down to how much AI distorts people’s beliefs — incidentally or deliberately — or how well AI tools and systems elevate people’s individual and collective ability to make sense of things.

In either case, we want to be able to recognize and measure the effects. Beyond “mere” factual accuracy (which may become less of a problem as the models get stronger), knowing how cooperative and epistemically virtuous systems are could be crucial for people making informed choices about which AI systems to use. For some use cases, having access to extremely precise statements from AI could be invaluable.


Examples of epistemic virtue and potential evaluations: 

  • Loyalty / creator-bias checks: Use “flip tests” to check if an organization, ideology, or other entity is advantaged in otherwise-identical, neutral scenarios
  • Clarity: Penalize hedging that obscures accuracy; reward crisp summaries with citations traceable to ground truth
  • Calibration: Assess probabilistic statements against proper calibration across diverse domains and ambiguity levels

These are illustrative and certainly not exhaustive. Part of the necessary theoretical work is identifying which evaluations are most important and where the biggest gaps lie.

We expect that some epistemic virtue evaluations will align with frontier companies’ incentives as they improve capabilities and market demand, while others may be orthogonal (or potentially adversarial). 

Both matter, though it seems likely that the latter will be underprovided, so we want to ensure there is a broader ecosystem of people and organizations thinking about, creating, and running these evaluations. This also helps ensure that people and the world at large pay sufficient attention to these kinds of behaviors, so that AI developers are disincentivized to deviate. 


Milestones

Theoretical foundation: Develop a broader framework and theory of epistemic evaluations; identify the most important epistemic virtue evaluations.

Operational scale: Build out suites of evaluations that operationalize parts of this theory and can cover many models, and regularly test AI models against meaningful evaluations (especially where good behavior (by social standards and general truth) might deviate from what’s best for developers).

Public salience: Reach a point where informed actors and the media regularly cite these evals when assessing AI systems, labs compete to improve on them, and some form of epistemic monitoring becomes a recognized dimension of AI quality.


We expect there is room for many individual and organizational contributions to create a thriving ecosystem of epistemic virtue evals. We’ve helped seed this with our support for the founding of Sophron Research.