← Undes Blog

Illustrative case study

When a Valid Evidence Request Disappears Between Pipeline Stages

The planner selected the right file. The final answer never received it. The bug was not in model reasoning but in preserving canonical state across the execution flow.

This is an illustrative, sanitized scenario. The paths and state names are generic and are not copied from a downstream project.

1. The symptom

During planning, an agent requests an owner file needed to verify a root-cause claim. The request appears in the planning artifact. Later, synthesis asks for the same evidence again or produces a result without it. Logs show no obvious read error.

It is tempting to blame the model for forgetting. The stronger debugging question is whether the selected evidence survived each state boundary between planning and final context assembly.

2. Trace the artifact through every stage

StageExpected stateObserved risk
PlanningCanonical target selectedTarget exists only in a display-oriented field.
NormalizationTarget identity preservedAliases produce two unequal keys for the same file.
FetchResult linked to requestContent stored under a legacy collection.
CompositionFetched evidence includedComposer reads only the canonical collection.
SynthesisClaim has evidence anchorMissing content is interpreted as not requested.

This trace distinguishes selection, delivery, and consumption. A successful file read proves delivery at one stage; it does not prove the final model received the content.

3. Typical root cause: two sources of truth

The most common shape is a canonical artifact introduced beside a legacy alias. One producer writes the new representation, another still writes the old representation, and the final consumer reads only one. Tests pass for each helper in isolation while the cross-stage contract fails.

Other variants include:

  • path normalization differs between request and result;
  • a structured request is converted to display text and cannot be reconstructed;
  • deduplication removes a request before confirming equivalent evidence was delivered;
  • a per-phase limit records “skipped” but later logic treats it as “absent”;
  • temporary edition storage redirects one stage but not its consumer.

The general failure is loss of provenance. The pipeline can no longer answer which request produced which content and which final claim consumed it.

4. Fix the ownership boundary, not the symptom

A robust fix follows four rules:

  1. Choose one canonical request and result representation.
  2. Normalize target identity once at the module boundary.
  3. Make legacy aliases explicit read-only fallbacks, then remove them when migration completes.
  4. Carry terminal states such as fetched, absent, unresolved, and skipped without collapsing them.
request id → normalized target → fetch outcome → evidence node → final claim

The final artifact should make a broken edge observable. If synthesis depends on an unresolved target, the verdict must downgrade rather than silently complete as though the target were irrelevant.

5. Define the regression contract across stages

Unit tests for the normalizer and file reader are necessary but insufficient. Add a composition test that starts with a structured request and asserts the exact evidence appears in final context with the same canonical identity.

  • Cover slash, relative-path, and case variants only where the platform contract permits them.
  • Cover structured and string request forms during migration.
  • Assert skipped and unresolved requests remain visible.
  • Assert deduplication keeps a provenance link to the delivered equivalent.
  • Run the test through the real artifact construction order, not only direct helper calls.
In a staged AI pipeline, preserving intent is a data-contract problem before it is a model problem.

See the shorter execution-flow example and the broader article on building a reviewable trust artifact.