This is an illustrative, sanitized scenario. It demonstrates the shape of an evidence-backed review and does not report a customer repository or measured deployment result.
1. The review request
A pull request changes retry behavior in middleware used by a request-processing path. The author adds a unit test for the new condition, and the existing test suite passes. The requested review is:
Review this pull request for risky assumptions, missing tests, and unsafe architectural changes.
A generic review might restate the diff or approve because CI is green. An evidence-backed review asks whether the tests cover the behavior that changed and whether the new retry semantics interact with concurrency or side effects.
2. Build a risk map from the change boundary
The changed branch affects more than its local function:
- the caller decides whether an operation can be replayed;
- the middleware controls retry count and delay;
- the downstream operation may have a non-idempotent side effect;
- concurrent requests can enter the same retry state;
- observability must distinguish the original attempt from retries.
The review should inspect those ownership points and state which ones were unavailable. It should not infer idempotency from a function name or assume concurrency safety because unit tests are green.
3. Detect the missing test, then explain why it matters
Suppose the changed middleware path has unit coverage for one failed attempt followed by success, but repository search finds no integration test that drives concurrent requests through the same retry boundary.
| Evidence | What it establishes | What it does not establish |
|---|---|---|
| New unit test | Single-request retry branch behaves as asserted. | Concurrent behavior or downstream idempotency. |
| Existing suite passes | No covered regression was detected. | Coverage of the new risk boundary. |
| No matching integration test found | Search did not locate expected coverage. | Definitive proof that no equivalent test exists. |
The last distinction matters. Search can miss generated, dynamically named, or externally hosted tests. The artifact should report the search scope and frame absence as bounded evidence, not universal proof.
4. Produce a bounded verdict
| Field | Result |
|---|---|
| Status | Needs review |
| Risk | Retry behavior may duplicate non-idempotent work under concurrency. |
| Evidence | Changed retry path; single-request unit coverage; no matching concurrent integration case found. |
| Assumption | The downstream operation is not independently idempotent. |
| Open check | Run a controlled concurrent request scenario before approval. |
This is more useful than “looks good” or “tests are missing.” It tells the reviewer which decision is blocked, what evidence supports the concern, and which check can resolve it.
5. Make the CI gate informative, not theatrical
A background review step should have explicit semantics:
- fail only on configured hard gates, not on every model concern;
- preserve diagnostic findings even when the process exits successfully;
- distinguish missing evidence from confirmed unsafe behavior;
- link findings to the reviewed commit and exact check scope;
- avoid sending secrets or excluded files to model providers;
- allow a human to reproduce the decisive checks.
Teams should calibrate the gate against actual review outcomes: false blocks, missed risks, time to disposition, and escaped defects. A plausible AI finding is not enough reason to stop delivery until the gate's precision is known.
A useful pre-merge gate narrows the next human decision. It does not replace that decision with a confidence score.
The compact version appears in Undes examples. For the economic model behind review load, read why review becomes the bottleneck.