Mechanical verification (TLA+)¶
Citation
ADR-0036 ·
Proposed 2026-06-09 ·
Specs: docs/proofs/BaudErur.tla
The property set is verified by three complementary layers:
| Layer | Tool | Scope | Strength |
|---|---|---|---|
| Executable verifiers | ghost verify-properties |
One specific MCAP | Per-trace |
| Property tests | Hypothesis | Random samples at production scale | Probabilistic |
| Mechanical verification | TLA+ / TLC | Full state space at small bounds | Exhaustive |
This page covers the third layer.
What TLA+ adds¶
The Python verifiers prove the property holds on a specific captured MCAP. The Hypothesis property tests prove it holds on the inputs the test generator sampled. Neither proves the property holds on all possible inputs.
TLA+ with the TLC model checker closes that gap on a bounded abstract model. With M=2, K=1, W=3, TLC enumerates the entire reachable state space (low hundreds of states) and verifies all five invariants exhaustively. No random sampling — every state is visited.
The five invariants¶
BaudErur.tla
defines and checks:
- INV_BAUD — formal statement of BAUD-v1's precondition → postconditions implication.
- INV_ERUR — formal statement of ERUR-v1's precondition → postconditions implication.
- INV_PARTITION — the BAUD + ERUR partition theorem, proved over
the full abstract state space. Promotes
test_smoke_baud_and_erur_partition_the_cycle_spacefrom "observed on one trace" to "proven on the model". - INV_NO_INVENTED_CONFIDENCE — bonus formal statement of MD-v1: the reference calibrator's adjusted level is always ≥ raw level in the lattice.
- INV_HISTORY_BOUND — safety net on the sliding-window model:
Len(history) <= Walways. If this ever violates, the WindowUpdate definition is wrong and so is RLB-v1's structural assumption.
Running TLC¶
Requires Java 17+ and tla2tools.jar:
java -cp /path/to/tla2tools.jar tlc2.TLC \
-config docs/proofs/BaudErur.cfg \
docs/proofs/BaudErur.tla
Expected output: Model checking completed. No error has been found.
See the proofs README for full instructions.
Honest scope¶
The TLA+ artifact does NOT establish:
- That the Python implementation faithfully mirrors the TLA+ model.
The bridge between the two is by code inspection only — a single
reviewer reading
src/project_ghost/core/feedback/reference_policy.pyand theCalibratedefinition inBaudErur.tlaand confirming they encode the same semantics. Future work could extract the TLA+ spec from the Python source automatically. - That the bounded constants prove the unbounded case. TLC is
exhaustive over the finite state space defined by
M=2, K=1, W=3. Behaviour at production constants (M=4, K=2, W=32) is covered by the property tests and CLI verifier, not the TLA+ proof. - That non-reference calibration policies satisfy the invariants. Each non-reference policy would need its own spec.
See ADR-0036 for the full framing.
Why TLA+ specifically, not Lean/Coq¶
The decision is in ADR-0036 §Alternatives. Summary: TLA+ delivers exhaustive model-checking at bounded scale in a few hours of work; Lean or Coq deliver unbounded theorem proofs at multi-week investment. The project's identity has been "make the property set citable end-to-end"; TLA+ continues that trajectory cost-effectively. A Lean proof remains a future candidate (potential ADR-0039).
Future work¶
- ADR-0037 candidate: TLA+ specs for RLB-v1 and FPB-v1 in their own modules.
- ADR-0038 candidate: CI job that runs TLC on every push, blocking merges that break any invariant.
- ADR-0039 candidate: TLAPS proof of the partition theorem for unbounded W, M, K — replacing TLC's "exhaustive over bounded state space" with "proved for any finite W, M, K".