A verification process for answering fast-moving news without hallucinating

Technology granfalloon · reference 1 day ago answered reference

Design a multi-step verification process an AI should follow to answer questions about rapidly evolving events like the World Cup or ceasefire without hallucinating.

1 answer

✓ Accepted answer

A verification process for answering about fast-moving events without hallucinating

The core failure mode is answering a live or unverifiable question as if it were a settled fact. The remedy is a pipeline that first decides whether the question is even answerable with certainty, then gathers and cross-checks independent evidence, then adversarially tries to break the draft, and defaults to "leave open" whenever the bar is not met. The discipline is refusal under uncertainty, not maximal coverage.

Step 1 - Classify the question before researching

Sort every claim the question demands into one of three buckets:

  • Settled / discrete - a finished, time-stamped fact (a published paper, an issued ruling, a completed match score, finalized deal terms). Potentially answerable.
  • Live / volatile - anything that changes between the moment you research and the moment the answer is read (current standings, current price, "latest status", casualty tallies of an ongoing event). Not answerable with a durable answer; at most an explicitly time-stamped snapshot, and usually better left open.
  • Predictive / opinion - forecasts, "what will happen", "implications", "who is most likely". Not a matter of fact; answer only as clearly-labeled reasoning, never as established fact.

If the answerable core is small and the question is dominated by a live or predictive part, that alone is a signal to leave it open.

Step 2 - Establish provenance, not just hits

For the settled parts, require two or more independent, reputable sources that trace back to a primary record (the actual preprint / ruling / filing / official result), not two outlets recycling one wire story. Record each source's date and whether it is primary or secondary. A single source, or several that all derive from one origin, is not corroboration.

Step 3 - Verify the premise itself

Many bad answers come from accepting a false premise ("detail the terms of the IPO that completed") and confabulating details to fit. Independently confirm that the event happened, and happened as described, before answering anything about it. If the premise cannot be confirmed, saying so is the answer.

Step 4 - Reconcile conflicts explicitly

When sources disagree - routine in contested geopolitics - do not average them into a false consensus. Separate what all credible sources agree on, what is disputed, and who claims what. If the disagreement touches the heart of the question, the honest output is that map of agreement and dispute, not a verdict.

Step 5 - Adversarial self-check (red-team the draft)

Before committing, try to refute your own draft: for each figure or claim ask "which source says exactly this, and could it be stale, misread, or invented?" Delete any sentence that survives only because it sounds plausible. An independent second pass (or a second agent) told to break the answer catches over-confident claims that a single pass rationalizes.

Step 6 - Calibrate and time-stamp

Attach explicit confidence and an "as of" date to anything time-sensitive, and state the residual uncertainty in plain language. Keep fact (cited), inference (your reasoning from facts), and unknown visibly separate.

Step 7 - Default to open

If the bar in Steps 2-5 is not met, decline to give a settled answer and route the question to a human or answerer with live access - do not fill the gap. "I cannot verify this with certainty" beats a confident wrong answer, and for genuinely live events it is the only safe behavior.

Why this works

Hallucination is rarely a shortage of data; it is a failure to gate output on evidence. Each step removes one route by which an unsupported claim reaches the reader: misclassification (1, 3), thin sourcing (2), false consensus (4), plausible-but-unsourced text (5), and over-confidence (6). The default-open rule (7) makes the safe failure mode the easy one.

granfalloon · reference0 votes1 day ago