Mechanical verification (TLA+)¶

Citation

ADR-0036 · Proposed 2026-06-09 · Specs: docs/proofs/BaudErur.tla

The property set is verified by three complementary layers:

Layer	Tool	Scope	Strength
Executable verifiers	`ghost verify-properties`	One specific MCAP	Per-trace
Property tests	Hypothesis	Random samples at production scale	Probabilistic
Mechanical verification	TLA+ / TLC	Full state space at small bounds	Exhaustive

This page covers the third layer.

What TLA+ adds¶

The Python verifiers prove the property holds on a specific captured MCAP. The Hypothesis property tests prove it holds on the inputs the test generator sampled. Neither proves the property holds on all possible inputs.

TLA+ with the TLC model checker closes that gap on a bounded abstract model. With M=2, K=1, W=3, TLC enumerates the entire reachable state space (low hundreds of states) and verifies all five invariants exhaustively. No random sampling — every state is visited.

The five invariants¶

BaudErur.tla defines and checks:

INV_BAUD — formal statement of BAUD-v1's precondition → postconditions implication.
INV_ERUR — formal statement of ERUR-v1's precondition → postconditions implication.
INV_PARTITION — the BAUD + ERUR partition theorem, proved over the full abstract state space. Promotes test_smoke_baud_and_erur_partition_the_cycle_space from "observed on one trace" to "proven on the model".
INV_NO_INVENTED_CONFIDENCE — bonus formal statement of MD-v1: the reference calibrator's adjusted level is always ≥ raw level in the lattice.
INV_HISTORY_BOUND — safety net on the sliding-window model: Len(history) <= W always. If this ever violates, the WindowUpdate definition is wrong and so is RLB-v1's structural assumption.

Running TLC¶

Requires Java 17+ and tla2tools.jar:

java -cp /path/to/tla2tools.jar tlc2.TLC \
    -config docs/proofs/BaudErur.cfg \
    docs/proofs/BaudErur.tla

Expected output: Model checking completed. No error has been found.

See the proofs README for full instructions.

Honest scope¶

The TLA+ artifact does NOT establish:

That the Python implementation faithfully mirrors the TLA+ model. The bridge between the two is by code inspection only — a single reviewer reading src/project_ghost/core/feedback/reference_policy.py and the Calibrate definition in BaudErur.tla and confirming they encode the same semantics. Future work could extract the TLA+ spec from the Python source automatically.
That the bounded constants prove the unbounded case. TLC is exhaustive over the finite state space defined by M=2, K=1, W=3. Behaviour at production constants (M=4, K=2, W=32) is covered by the property tests and CLI verifier, not the TLA+ proof.
That non-reference calibration policies satisfy the invariants. Each non-reference policy would need its own spec.

See ADR-0036 for the full framing.

Why TLA+ specifically, not Lean/Coq¶

The decision is in ADR-0036 §Alternatives. Summary: TLA+ delivers exhaustive model-checking at bounded scale in a few hours of work; Lean or Coq deliver unbounded theorem proofs at multi-week investment. The project's identity has been "make the property set citable end-to-end"; TLA+ continues that trajectory cost-effectively. A Lean proof remains a future candidate (potential ADR-0039).

Future work¶

ADR-0037 candidate: TLA+ specs for RLB-v1 and FPB-v1 in their own modules.
ADR-0038 candidate: CI job that runs TLC on every push, blocking merges that break any invariant.
ADR-0039 candidate: TLAPS proof of the partition theorem for unbounded W, M, K — replacing TLC's "exhaustive over bounded state space" with "proved for any finite W, M, K".