ADR-0008 — Perception Failure Modes and Uncertainty Propagation¶

Status: Accepted
Date: 2026-06-04

Context¶

ADR-0000 commits Project Ghost to GPS-denied vision-inertial autonomy. ADR-0001 freezes the HAL contract, ADR-0005 freezes the canonical VehicleState, and ADR-0006 freezes the event bus. None of these documents specifies what the system does when perception fails or degrades — when the camera sees a textureless wall, the IMU saturates, the scene goes dark, or VIO loses tracking.

The architecture red-team review (docs/reviews/architecture_red_team_review.md, §2.6) calls this out bluntly: "GPS-denied vision-only is hard. Without features, VO fails. Without loop closure, drift accumulates. The current design does not specify graceful degradation. Without an ADR on perception failure modes, the vision-only promise is marketing."

There are two intertwined problems:

Failure modes are nameless. The codebase will accumulate ad-hoc checks (if num_features < 10: ...) scattered across perception, estimation, and control. Each developer invents thresholds. Behavior under degradation becomes emergent and untestable.
Uncertainty is implicit. ADR-0005 mentions a covariance_15x15 slot on NavigationState but does not bind a contract for who fills it, how it propagates, and what consumers may assume. Without a uniform contract, a covariance produced by VO will be incommensurable with one produced by EKF, and downstream code will silently treat valid and stale estimates as equivalent.

This ADR closes both gaps for the lifetime of the project. It is a precondition for ADR-0009 (Autonomy Under Uncertainty), which builds the behavioral policy on top of this substrate.

Decision¶

Project Ghost adopts a typed catalog of perception failure modes, a uniform uncertainty envelope carried by every perceptual estimate, and quantitative entry criteria for each mode. These are frozen contracts; modules may not invent private equivalents.

1. Uncertainty envelope (the `Estimate[T]` contract)¶

Every perceptual or estimation output that crosses a module boundary is wrapped in an Estimate[T] with the following fields:

Field	Type	Meaning
`value`	`T`	The estimate itself (pose, depth, feature track, etc.)
`covariance`	`np.ndarray \\| None`	Second-moment uncertainty in the same basis as `value`. `None` is only legal when `source.kind == "groundtruth"`.
`confidence`	`float \\| None`	Scalar in `[0, 1]`; semantic is producer-defined and documented in `docs/specs/uncertainty.md`. Optional.
`validity`	`Validity` enum	`VALID`, `DEGRADED`, `STALE`, `INVALID`. See §2.
`stamp_sim_ns`	`int`	When the estimate was produced, not consumed.
`source`	`EstimateSource`	Producer identity (`module_id`, `kind`, `schema_version`).

The envelope is itself a frozen dataclass. Arrays inside value and covariance are sealed with flags.writeable=False at construction, consistent with ADR-0005.

Composition rule. When a consumer combines two Estimate[T] (e.g. EKF update), it MUST emit a new Estimate[T] whose validity is the most restrictive of the inputs, and whose covariance accounts for both. There is no silent upgrade from DEGRADED to VALID.

2. Validity ladder¶

Validity is a totally-ordered enum:

VALID > DEGRADED > STALE > INVALID

Value	Meaning	Consumers may
`VALID`	Inside published nominal envelope.	Use directly for control and mission decisions.
`DEGRADED`	Producer detected reduced quality but output is still informative. Covariance MUST reflect the degradation (inflated, not nominal).	Use for state estimation. Mission logic MUST consult the mode (see §3) before acting.
`STALE`	Estimate is older than its `max_age_ns` budget. Value is held; covariance is inflated by a documented growth model.	Use only for dead reckoning. Control loops MUST switch to fallback gains.
`INVALID`	Producer cannot supply a meaningful estimate.	MUST NOT use the value. Consumers fall through to last-known + dead reckoning, or trigger a perception failure mode (§3).

Validity is never derived implicitly from "looks reasonable" heuristics on the consumer side. The producer is authoritative.

3. Perception failure mode catalog¶

The system recognizes a closed set of perception failure modes. Each has a stable PerceptionMode identifier, a quantitative entry criterion, and a corresponding behavioral response (defined in ADR-0009; this ADR only fixes the catalog and the criteria).

Mode	Entry criterion	Exit criterion
`NOMINAL`	Default. All perceptual `Estimate.validity == VALID` for ≥ `nominal_hold_ms` (default 200 ms).	—
`LOW_TEXTURE`	VO feature count < `min_features` (default 30) for ≥ `low_texture_window_ms` (default 500 ms), OR mean feature-track length < `min_track_length` (default 5 frames).	Feature count ≥ `min_features × 1.5` for `nominal_hold_ms`.
`LOW_LIGHT`	Mean luminance < `min_luminance` (default 0.05 normalized) OR camera AGC saturated at max gain for ≥ `low_light_window_ms` (default 1000 ms).	Luminance ≥ `min_luminance × 2.0` for `nominal_hold_ms`.
`IMU_SATURATION`	Any IMU axis at `imu_saturation_threshold` (default 90 % of full scale) for ≥ `imu_saturation_window_ms` (default 50 ms).	All axes < `imu_saturation_threshold × 0.7` for `nominal_hold_ms`.
`VIO_LOST`	EKF innovation gate fails for ≥ `innovation_fail_count` consecutive updates (default 5), OR no VO update received for `vio_timeout_ms` (default 200 ms).	New VIO `Estimate.validity == VALID` accepted by the gate for `nominal_hold_ms`.
`MAP_AMBIGUOUS`	Loop-closure candidate score within `loop_closure_ambiguity_margin` (default 0.1) of the second-best for ≥ `ambiguity_window_ms` (default 500 ms).	Best candidate clears the second-best by `margin × 2`.
`PERCEPTION_DEAD`	All of `LOW_TEXTURE`, `LOW_LIGHT`, `VIO_LOST` simultaneously, OR every perceptual producer reports `validity == INVALID`.	All producers report `VALID` for `nominal_hold_ms × 2`.

All thresholds live in docs/specs/uncertainty.md and configs/perception/*.yaml; they are tunable per scenario but the names of the modes are frozen.

4. Mode transitions are events¶

Every transition between perception modes is published as an Event on the bus (per ADR-0006) with severity ≥ WARN. Transitions are logged to telemetry. This guarantees that any incident in simulation or hardware leaves a trace in MCAP that can be replayed (per ADR-0003).

5. Estimator obligations¶

State estimators (Phase 3 onwards) MUST:

Tag every output NavigationState with the highest-restrictive validity among their inputs over the last validity_window_ms.
Inflate covariance_15x15 whenever any input is DEGRADED or STALE, using the inflation models documented in docs/specs/uncertainty.md. No silent fallback to nominal covariance.
Publish a PERCEPTION_MODE_CHANGED event on every transition.

6. What this ADR does not decide¶

Behavioral response per mode (hover, slow ascent, blind RTL, kill) — that is ADR-0009.
Concrete VO/SLAM algorithm choice — that is a Phase 4 decision.
Exact covariance inflation models — those live in docs/specs/uncertainty.md and may evolve without an ADR, as long as the contract above holds.
ML-based perception failure detection — explicitly out of scope. ML may be an opt-in consumer of the catalog, never a replacement.

Consequences¶

Positive.

A single, named vocabulary for perception failure across the codebase. Code review can reject ad-hoc thresholds.
Telemetry and replay carry mode transitions automatically, so any flight incident is reproducible.
ADR-0009 can be a thin behavioral policy on top of a stable mechanism layer.
Estimate[T] makes covariance presence a type-level concern. A consumer that forgets to check validity is visible in review.
Backends that lack ground truth (Gazebo, hardware) are not penalized: Estimate.covariance == None is illegal there by construction.

Negative.

Estimate[T] wrapping adds boilerplate at every module boundary. Mitigated by helpers in core.uncertainty.
The catalog is closed. Adding a new mode requires a new ADR (or an explicit "this ADR supersedes 0008" if the change is structural). This is deliberate — silent expansion of failure-mode taxonomies is exactly how SLAM stacks rot.
Thresholds need empirical tuning per scenario. Default values above are starting points, not promises.
Estimators must implement covariance inflation explicitly; this is non-trivial work in Phase 3.

Alternatives considered¶

A. Implicit confidence scalars only. Producers emit a single confidence ∈ [0, 1]; consumers decide what to do. Rejected: opacifies what "0.4 confidence" means across modules; covariance information is lost; mission code accumulates magic numbers.

B. Per-module ad-hoc failure flags. Each consumer invents its own checks. Rejected: this is the status quo we are escaping. It produces untestable behavior under degradation.

C. Continuous "health score" instead of discrete modes. Single scalar per perception channel. Rejected: control and mission code want to dispatch on a few qualitative states, not threshold-on-a-float. A continuous score can still exist inside a mode (for fine-grained covariance inflation), but the boundary contract is discrete.

D. Learned failure detector (anomaly model). A model classifies perception quality. Rejected as the primary mechanism: not interpretable, not deterministic, not auditable, breaks replay. May be added later as an opt-in producer whose output flows through the same Estimate envelope.

E. Defer the decision to ADR-0009. Reject the split; put modes and behavior in one document. Rejected: mechanism (this ADR) and policy (ADR-0009) have different change rates. The catalog is structural and should not be coupled to behavioral debates.

F. Optional[T] instead of Estimate[T]. Use Python's standard option type and rely on documentation for uncertainty. Rejected: option type collapses STALE and INVALID into None; covariance has nowhere to live; producers cannot communicate degradation without losing the value.