Epistemic Virtue Evaluations

In AI, you get what you measure:

We need AI systems to be working for users, not acting sycophantically or subtly selling them things.
There are immense commercial pressures on the companies producing AI, and they don’t all incentivize honest and transparent behavior.
We need independent third-party evaluations that let us tell which systems most help people believe true things for the right reasons — and which just appear to.

AI systems should empower people to think critically and provide complete, accurate results, not just give comfortable, reassuring answers.

We want to see an ecosystem where:

We have great theoretical work identifying what it actually means for AI systems to genuinely embody — and enable — good, helpful reasoning
There are public benchmarks and privately withheld evals that all major AI products are assessed on
Journalists cover these evals and hold companies accountable when they fall short

See our related blog post to learn more about this initiative.

More about epistemic virtue evaluations