ADR-0009 — Autonomy Under Uncertainty¶
- Status: Accepted
- Date: 2026-06-04
Context¶
ADR-0008 freezes the mechanism layer for perception failure: a typed Estimate[T] envelope, a Validity ladder, and a closed catalog of PerceptionMode values with quantitative entry criteria. It deliberately stops short of saying what the vehicle should do in each mode.
ADR-0000 commits the project to a single behavioral principle: the system must know when it knows, know when it does not know, and alter its behavior accordingly. The architecture red-team review (§2.6) is sharper: without a documented degradation policy, the GPS-denied autonomy claim is marketing.
A behavioral policy under uncertainty has to resolve, at minimum, four questions:
- What does the vehicle do when it enters each perception mode? (Hover? RTL? Land? Hand to pilot?)
- Who has authority — autonomous controller, pre-recorded recovery routine, human pilot, safety supervisor?
- How does the mission layer reason about uncertainty when planning, not only when reacting?
- What invariants are non-negotiable even under degraded perception (kill switch, no-fly geometry, max altitude)?
These are policy decisions with a different change rate than the mechanism. They will be revisited as the project moves from PyBullet to Gazebo+PX4 to hardware. Putting them in a separate ADR keeps the mechanism stable while the policy evolves.
This ADR fixes the policy for Phases 1–6. Phases 7+ (real hardware) may add stricter rules without breaking this contract; they may not loosen it.
Decision¶
Project Ghost adopts a three-tier autonomy stack in which perception modes (ADR-0008) map to authority transitions, mission-level uncertainty reasoning is explicit, and a small set of safety invariants is enforced unconditionally.
1. Tiered autonomy¶
| Tier | Name | Authority over actuators | When active |
|---|---|---|---|
| T0 | Safety supervisor | Veto on every command; cannot generate motion of its own except STOP/KILL. |
Always. |
| T1 | Reflex layer | Closed-loop stabilization, attitude/rate control, dead reckoning hold. Deterministic, no perception input. | Always running; takes command when T2/T3 yield. |
| T2 | Reactive layer | Behaviors per perception mode (hover, slow ascend, blind RTL). Consumes Estimate[T] but uses no global map. |
Engaged in DEGRADED modes per §2. |
| T3 | Deliberative layer | Mission planning, SLAM consumer, frontier exploration. Trusts VALID estimates. |
Engaged only in NOMINAL mode. |
Authority flows downward when uncertainty rises. T3 yields to T2 when leaving NOMINAL. T2 yields to T1 in PERCEPTION_DEAD. T0 may pre-empt any tier at any time.
2. Behavioral response per perception mode¶
The mapping below is the canonical policy. Modules implementing reactive behaviors MUST consult this table; deviations require an ADR amendment.
PerceptionMode |
Active tier | Behavior |
|---|---|---|
NOMINAL |
T3 | Mission plan executes normally. |
LOW_TEXTURE |
T2 | Reduce horizontal speed to low_texture_max_speed_mps (default 0.5). Continue mission if path is collision-free per last known map; otherwise hover and request replan. |
LOW_LIGHT |
T2 | Stop forward motion. Hold position with inflated covariance. Emit RECOVERY_REQUESTED event. After low_light_recovery_timeout_ms (default 5000), ascend slowly at slow_ascend_mps (default 0.3) toward recovery_altitude_m (default 2.0 above takeoff) seeking texture. |
IMU_SATURATION |
T1 + T0 | T2/T3 suspended for imu_recovery_hold_ms (default 200). T1 reduces commanded body rates to zero. T0 inspects: if saturation persists > imu_kill_threshold_ms (default 1000), escalate to KILL. |
VIO_LOST |
T2 | Hover on dead reckoning for dr_hover_window_ms (default 3000). If VIO does not recover, attempt blind RTL toward takeoff coordinate using IMU integration with growing covariance. If covariance crosses dr_abort_covariance, switch to controlled descent. |
MAP_AMBIGUOUS |
T2 | Reject loop closures while ambiguous. Continue VO-only navigation. Do not commit map updates. Replan to revisit a previously confident landmark if available. |
PERCEPTION_DEAD |
T1 + T0 | All deliberative and reactive control suspended. T1 commands controlled descent at dead_descent_mps (default 0.5). T0 monitors altitude; cuts thrust at kill_altitude_m (default 0.3 above takeoff). |
Every mapping above is a default. Scenario configs (configs/missions/*.yaml) may override numeric thresholds but may not change the active tier or the qualitative behavior.
3. Uncertainty-aware mission planning (T3)¶
The deliberative layer treats uncertainty as a planning input, not only a runtime signal.
- Plans are scored against an expected information gain term and a risk term. The risk term integrates expected
validityalong the planned path, using a forward uncertainty model documented indocs/specs/mission.md. - The planner rejects plans whose worst-case predicted covariance exceeds
mission_max_covariance_normover a contiguous segment longer thanmission_max_blind_segment_m(default 5). - When uncertainty is unavoidable (e.g. crossing a textureless area is required to reach the goal), the planner inserts an explicit active perception sub-goal: slow ascent, yaw scan, or revisit of a known landmark before committing.
- Goals carry an
uncertainty_budget: the planner records, accepts, and exposes the maximum tolerated covariance for each goal. Mission code may not silently exceed the budget.
These mechanisms exist from Phase 5 onward; Phases 1–4 satisfy the contract trivially (no planner, no mission).
4. Safety invariants (T0)¶
The safety supervisor enforces invariants that no tier may violate, regardless of perception state. They are evaluated on every actuator command before transmission.
| Invariant | Default | Override |
|---|---|---|
max_altitude_m |
30.0 (sim), 5.0 (hardware) | Scenario config; hardware floor ≤ scenario.max_altitude. |
geofence_polygon |
Scenario-defined | Must be present for any non-empty_room scenario. |
max_battery_age_s |
1200 | Hardware-only; ignored in sim. |
command_freshness_ms |
200 | Same as ADR-0001 command_timeout_ns. |
nan_inf_reject |
Always on | Cannot be disabled. |
kill_on_loss_of_comms_ms |
None in sim; 5000 on hardware | Cannot be loosened on hardware. |
Any violation produces a SAFETY_VIOLATION event with severity CRITICAL, an immediate STOP or KILL command per a small lookup table, and a persistent telemetry record.
5. Human override¶
When a pilot input channel is present (manual flight, Phase 1; HIL, Phase 8; hardware, Phase 9), pilot input is treated as a T2-level authority by default and is upgradable to T0 by an explicit kill or RTL command.
- Pilot input never re-enables a tier the safety supervisor has disabled.
- Pilot input cannot disarm safety invariants from §4.
- A pilot may force the vehicle into a worse mode (e.g. demand aggressive maneuver under
LOW_TEXTURE); the system complies but logs aPILOT_OVERRIDE_DEGRADEDevent with the active mode and validity at the time.
6. Honesty obligations¶
This section makes explicit a set of practices that follow from the principle "know when you do not know":
- No estimator may publish
validity == VALIDwhen its input covariance is missing or its innovation gate is failing. The default in ambiguity isDEGRADED. - No planner may consume an
Estimatewithout explicitly readingvalidity. Static analysis (Phase 5 task) will enforce this. - Every
RECOVERY_REQUESTEDevent must specify which producer asked for recovery and why. Anonymous recovery requests are rejected. - Replays surface mode history as a first-class telemetry channel (
/perception/mode). Any review of an incident starts there.
Consequences¶
Positive.
- A single document maps every perception failure mode to a behavior. Reviewers can audit the table directly.
- Authority transitions are explicit and testable. T0 is the only tier with veto power; this is the structural reason hardware deployment will not require rewriting upper layers.
- Mission planning that is honest about uncertainty is now a contract, not an aspiration.
- Safety invariants are centralized; hardware can tighten them without touching reactive or deliberative code.
- The pilot model is realistic about how humans actually interact with semi-autonomous systems: they intervene under stress, often making things worse. The system records this rather than denying it.
Negative.
- The table in §2 is opinionated. Some choices (e.g. blind RTL after
VIO_LOST) will need empirical revision. Each revision is an ADR amendment, which is friction. Justified because silent policy drift is worse. - Active perception in mission planning (§3) is non-trivial to implement; it is deferred to Phase 5+ but the contract is fixed now.
- The pilot override semantics (§5) constrain UX design: the system cannot offer a "trust me, disable safety" mode. This is intentional.
- Hardware-only invariants (battery age, comms loss kill) create a divergence between sim and hardware behaviors. The divergence is bounded and documented, not hidden.
Alternatives considered¶
A. Single monolithic autonomy layer. No T1/T2/T3 split; one controller does everything with a state machine inside. Rejected: experience in PX4, ArduPilot, and academic stacks shows that mixing reflex with deliberation makes both untestable. The tiered structure is widely validated.
B. Behavior trees as the policy language. Adopt a BT framework (py_trees, BehaviorTree.CPP) for §2. Rejected for Phase 1: introduces a runtime dependency and a debugger surface ahead of need. May be revisited at Phase 5 if the table in §2 outgrows hand-written dispatch. The contract here would survive that change.
C. Continuous-mode controller (no discrete modes). Mode-free MPC with uncertainty in the cost function. Rejected: not auditable, not interpretable to a pilot, hard to replay-debug. May coexist with this ADR inside a single mode (e.g. NOMINAL) but does not replace the discrete dispatch.
D. Pilot as T0. Promote the human pilot above the safety supervisor. Rejected: real pilots make wrong calls under stress (this is the whole reason for autonomy). Safety supervisor invariants are a property of the airframe and its environment, not of pilot intent.
E. Defer policy to "when we have hardware". Phase 0 punts on the policy, builds the mechanism, and writes the ADR in Phase 8. Rejected: every Phase 1–7 design decision that touches actuator commands, mission planning, or estimator output depends on knowing what the policy will be. Deferring leaks ad-hoc policy into every layer.
F. Learn the policy from data (RL). A learned controller chooses the response per mode. Rejected as the primary policy: not safe to deploy on hardware, not interpretable, breaks replay determinism. Acceptable as an opt-in sub-policy within a mode, when its outputs flow through the same actuator boundary and T0 invariants.