The Epistemic Stack

The vision

We want the best AI systems to be a force for radically improved understanding and better decision-making — and we think they can be!

Used carelessly or maliciously, AI systems and tools can spread misleading information, give a false sense of confidence, and reinforce existing confusion or division. But the state of the art is already encouraging: some AI tools and workflows are enabling individuals to understand difficult topics more quickly and thoroughly. Despite this progress, we think they could be much better suited to challenging or contentious topics. In particular, this means situations where not everyone has the same information or incentives — the problem of collective sensemaking.¹¹ Why are these cases so important? The frontier of science, especially with and involving AI, is moving fast, and has enormous implications for our future. And with this potential upheaval comes disruption to politics and society which people will be better equipped to deal with if they can understand their situations better.

These are not novel challenges: human communicators can be forgetful, poorly informed, or outright deceitful. We have historically developed all kinds of approaches to mitigate this: check your sources, make sure mistakes and misleading information are labeled or retracted, repeat observations and experiments to make sure, look for arguments and evidence beyond what’s immediately presented to you, and so on.²² These happen throughout forms and types of communication. Fact-checking, for example, in science consists of looking into citations, and demanding clearer citations when they’re missing; in media it demands linking and referring to justifications and providing verified testimony or documented evidence; and in conversation it consists of moves like asking others how (and from whom) they learned about what they’re sharing and assessing their confidence and rationale. Because these are time- and effort-intensive, people often don’t manage! Software using AI in the right ways can make these, and other information-hygiene processes, vastly cheaper and more accessible, with benefits for everyone, from scientific advancement to security to competence and alignment in government, as well as simply making everyday investigations faster and more reliable.

The epistemic stack is our term for a suite of technologies that help society track and communicate information. Not by directly arbitrating what’s true or who sees what, but by scaling and expanding everyone’s prospects for building full pictures of the available arguments and evidence on a topic. Similar, precursor technologies include libraries, citations, web search, encyclopedias (including Wikipedia), and web forums and archives. What does the next upgrade look like?

Inspired by idealized scientific discourse, AI background investigation can make evidence and reasoning dependencies, right down to raw observations and original thinking, far more accessible. Finding the sources not mentioned — including explicit rebuttals, alternative positions, and topically relevant points of information on all dependencies — is also crucial to building a full picture, and largely automatable: no more accidentally relying on old or discredited information. Finally, automated tools can also do some amount of suggested overall weighting and assessment of sources and claims, though this is inherently more subjective. Collaborative software trust layers may be a valuable complement. Journalists, researchers, analysts, regulators, and the general public all stand to benefit from support tooling like this, just as all benefit from web search, Wikipedia, and so on.

We are interested in catalyzing and supporting projects aimed at establishing or furthering these products (see our Epistemic Case Study competition). Although we expect much idiosyncratic exploration to give rise to innovations and best practices in this space, we think some of the most valuable applications will emerge from shared or shareable investigations, enabling people to get consensus on the full picture of the arguments — even if not on the all-told conclusions. Some of these will come from integration with widely used and shared platforms like social media and AI assistants.

Objectives

Easy-to-use, reliable knowledge support tooling: finding, gathering, and ingesting relevant resources, mapping their relationships as evidence and argument, understanding evidence and validity, and assessing overall conclusions and remaining uncertainties. Much of this knowledge work can be carried out by LLM-powered workflows, when the outputs are carefully created and validated, producing persistent, maintainable, growing knowledge bases.

Interoperable and compounding: like the scientific method, Wikipedia, and the internet itself, let knowledge work build on itself, be easily shared, expand in coverage and deepen over time… even when the different parties involved don’t share all of the same beliefs, principles, and priorities. Technical emphasis might be on:

Common data formats for sharing the structural artifacts of investigations, enabling subsequent work to build on earlier work
Technical attestation of sources and work done, reducing concerns about tampering
Contributions of review and scrutiny, with identity for trust development
Indexing and discovery of existing investigation artifacts (especially the relatively objective aspects)

Integrated where it matters: while directly usable as queryable knowledge bases, real reach and broad benefits may come from making this foundational epistemic infrastructure easily available across the contexts where people communicate, gather insights, and reach conclusions or decisions. This means creative adaptation and integration with existing and imagined systems, which might include:

‘Deep research’, AI briefing, and chatbot-like products, where more checkable and compounding results are likely to simultaneously improve the quality of outputs and enable more trust in conclusions.
Legal and medical research, gathering relevant case law or systematic review for compliance.
Other scientific and academic work, where literature review and secondary research are critical to establishing and expanding the frontier.
Incumbent and emerging social media and other platforms where societal conversations take place.
Community notes and fact-checking, which inherently rely on repeatable investigation into the justification and origins of claims.
The tools used by foresight/forecasting practitioners and intelligence communities to gather and synthesize models, reference classes, and intel — and similarly those processes used by analysts and decision-makers throughout government.

In an ideal case, the most objective investigation — who said what and when — would be uncontroversial and easily shared. The main challenge is knowing where to look and what to search for.

Handling Objectivity and Subjectivity

Nearly as uncontroversial are the structural investigations into what evidence and reasons are given as support for claims, what topics are being discussed, and which points are similar to each other or are offered as disagreements — ultimately revealing the full range of arguments and evidence for, against, or qualifying a given position. This too can be fairly objective and repeatable, and thus compounding (though support for differing interpretations and ongoing improvement of coverage are important).

As for assessing the best arguments and evidence, or determining the highest priority evidence gaps or best courses of action: these are inherently more subjective! Of course there are useful best practices and much common ground, but considerations like trust, credences, and preferred methodologies necessarily make these levels of analysis less universally shareable. Nevertheless, disagreements stand the best chance of productively moving forward if the grounding structure is mutually transparent and interpretable.

These considerations point toward the value of:

shared, or at least mutually-intelligible protocols³³ Protocols, when built for today’s and future AI systems, can look much different than what may be typically envisioned. For example, it’s possible that extensive use of natural language is best, rather than a restrictive schema with limited concepts and simplified information. and data formats, especially for largely-objective structural and provenance levels of investigation
deferring trust and judgment to late binding assessment layers of responsibility, which can be specialized, customized, and revised in light of new considerations or based on a party’s perspective, while keeping grounding structure and its provenance easily accessible
indexing and discovery mechanisms for sharing and building on the mostly objective levels of structural and provenance analysis

This perspective aims to treat as much of the epistemic stack as possible as a knowledge commons, while taking seriously the diverse and sometimes adversarial nature of communication and deliberation.

Lab Leaks, Black Holes, and Eggs – Epistemic Case Study Competition is our current effort to facilitate progress in this space.

View the contest

Why are these cases so important? The frontier of science, especially with and involving AI, is moving fast, and has enormous implications for our future. And with this potential upheaval comes disruption to politics and society which people will be better equipped to deal with if they can understand their situations better.
↩︎
These happen throughout forms and types of communication. Fact-checking, for example, in science consists of looking into citations, and demanding clearer citations when they’re missing; in media it demands linking and referring to justifications and providing verified testimony or documented evidence; and in conversation it consists of moves like asking others how (and from whom) they learned about what they’re sharing and assessing their confidence and rationale. ↩︎
Protocols, when built for today’s and future AI systems, can look much different than what may be typically envisioned. For example, it’s possible that extensive use of natural language is best, rather than a restrictive schema with limited concepts and simplified information. ↩︎