From Prompt to Reviewable Trust Artifact

A single chat completion collapses context selection, reasoning, uncertainty, and recommendation into one piece of prose. A review pipeline separates those responsibilities so a reviewer can inspect where a conclusion came from.

1. Start with the public contract

Internal phases and model assignments can evolve. The stable contract should be the artifact that leaves the system: a decision, its evidence, assumptions, rejected hypotheses, risks, open checks, and a trust verdict.

If the contract is "three agents discussed the task," the system can change its visible behavior whenever prompts or models change. If the contract is evidence-backed output, the implementation remains free to evolve while the review boundary stays testable.

prompt → staged analysis → evidence-backed artifact → human decision

2. Context preparation: decide what the system is allowed to know

Repository analysis begins before a model responds. The pipeline must identify project configuration, task constraints, relevant files, structural symbols, and edition or privacy rules that constrain data flow.

A useful context layer records:

which project is active;
which files and ranges were delivered;
which requested targets were unavailable or excluded;
which provider receives each phase;
which task constraints must survive every later transformation.

Context selection is itself a source of error. If an owner file is absent, later agents may agree on a coherent explanation that is unsupported by the actual implementation. Missing context must remain visible.

3. Proposal and critique: create disagreement deliberately

The first candidate is a hypothesis, not a verdict. Independent review passes should challenge its root cause, compatibility assumptions, test plan, and blast radius. The goal is not agent count. The goal is differentiated work.

Pass	Primary responsibility
Proposal	Produce a concrete explanation or change candidate.
Critique	Find unsupported premises and counterexamples.
Implementation review	Check repository fit, boundaries, and tests.
Reconciliation	Resolve contradictions using evidence rather than majority vote.

Agreement is useful only after positions are independently grounded. Several agents repeating the same context-limited answer do not create confidence. A critique that forces the proposal to narrow is often more valuable than consensus.

4. Evidence expansion: fetch what the claims depend on

During analysis, a claim may name a file, symbol, caller, configuration value, or test that was not part of the initial context. The pipeline should treat this as a request for evidence rather than let the model fill the gap from prior knowledge.

An evidence-expansion result should distinguish:

fetched: the requested target was found and delivered;
absent: a bounded search established that it does not exist;
unresolved: the target could not be located or disambiguated;
skipped: policy, limits, duplication, or another explicit reason prevented delivery.

Those states are not interchangeable. "Not fetched" must never silently become evidence of absence.

5. Reconcile into a trust verdict

The final phase should not merely summarize the discussion. It should adjudicate claims against the evidence graph, preserve unresolved contradictions, and choose a verdict that matches the weakest material dependency.

Artifact field	Why it exists
Decision	Gives the operator a bounded next action.
Evidence used	Makes support inspectable.
Assumptions	Prevents unknowns from masquerading as facts.
Rejected hypotheses	Shows that alternatives were considered.
Risks and open checks	Defines remaining human or automated work.
Trust verdict	States how far the result can be used now.

A diagnostic verdict can be the correct output. The system should prefer a bounded "needs review" over a confident merge recommendation when a material evidence edge is missing.

6. Failure semantics are part of the product

A trustworthy pipeline must carry failures forward: model truncation, unavailable providers, skipped evidence requests, malformed responses, test failures, and unresolved targets. Recovering from an error is acceptable; erasing it is not.

This leads to a practical invariant: every autonomous action should pass through an observable verdict gate before it is treated as authoritative. The gate can accept, downgrade, request more evidence, or stop. It should not silently upgrade uncertainty.

The pipeline is valuable when it makes the boundary between checked, inferred, and unknown information harder to lose.

The operational commands and artifact locations are documented in the product documentation. For a concrete scenario, read the OIDC callback race-condition case study.