Legacy — Validation

SECTION 01

The validation gap

The planner’s career is often shorter than the time it takes for the plan to be tested. Most plans are never publicly tested at all.

The customer pays for something whose quality cannot be verified for decades

SECTION 02

How traditional reputation actually accretes

Six mechanisms. None are short-loop. The shortest is “peer referral” at 5–10 years; the longest is multi-generational pedigree.

Customer picks by referral plus surface signals · quality verification is post-hoc, often post-mortem

SECTION 03

The asymmetric play — legibility as validation

Skip the impossible. Make every recommendation auditable in <15 minutes by any competent professional. Their sign-off becomes the short-loop signal.

The 30-year loop stays unsolved · the engine adds a 7-day loop alongside it

SECTION 04

Short-loop signals — what to actually measure

Seven signals you can capture without waiting for anyone to die. Each has a loop length and a plain-English read on what a good number looks like.

Professional sign-off rate

DAYS

When the rec gets sent to Boone, does he say “yes, this is right” or “no, you missed something”? Track the ratio across hundreds of recs. A healthy engine’s sign-off rate climbs over time as the catalog matures.

Confidence calibration

MONTHS

The engine said “65% confident” on 100 recommendations. If 60–70 of them survive sign-off and the rest get rejected, the 65% was honest. If only 40 survive, the engine is bluffing — the number means something stronger than what the underlying analysis supports.

Critique convergence

MINUTES

When the principal pushes back on a recommendation, does the engine settle in 1–2 turns (“re-examined the specific claim, here’s what changed”) or spiral through 5 reversals chasing the user’s framing? Convergence is the failure-mode test from reflection.md.

Reflection rate

WEEKS

How often does the user catch an engine error that the engine itself didn’t flag? Should be non-zero (the engine isn’t perfect) but trending down over time. Zero is a red flag — the engine is hiding failures rather than avoiding them.

Catalog growth

QUARTERS

How many new statute-level traps did we add this quarter? Slow steady growth = healthy maturity. Sudden spike = we were missing things. Zero growth = we stopped looking. Track the rate, not just the total.

Adversarial agreement

DAYS

Show the same recommendation to 3 independent professionals. 3-of-3 agree with the engine ⇒ strong signal. 0-of-3 agree ⇒ the engine sees something the field doesn’t (and is probably wrong). Cheap to run on the highest-stakes recommendations.

Intermediate-event survival

YEARS

When a real event (audit, divorce, business sale, restructuring) happens to a planned estate, did the plan hold up — or did it need emergency rework? Each event is a partial real-world test. This is the only signal that touches actual outcomes, and even it doesn’t need to wait for death.

SECTION 05

Publishing the catalog — the community moat

The traditional industry is secretive. A published, citable catalog invites a tight ~50–100-person professional community to engage. They critique; we improve; the catalog becomes the canonical reference. Feasible — this isn’t mass-market publication.

A moat the secretive industry doesn’t have · not mass publication · tight professional community

WHY THIS WORKS

Critique is the quality loop

Every pro who reads an entry and disagrees produces a verification event. That event either confirms (entry survives), refines (we add a fact pattern), or rejects (we update or supersede). The catalog improves faster than any private internal review could deliver.

WHAT TO WATCH

Compartmentalisation is still required

Publish the catalog — the statute-level traps, the authorities, the per-entry confidence. Never publish a client’s actual graph: entities, balances, citizenships, family. The catalog is the universal layer; the graph is the private layer. Same scope-fence model as Scopes page.

SECTION 06

The Goodhart trap

If 65%-confidence recs get more sign-offs, the engine will learn to tag more things 65% — regardless of underlying epistemics. The signal stops measuring quality and starts measuring “what the signal rewards.” Three guardrails break the cycle.

Three independent checks · any one of them breaks the feedback flywheel

SECTION 07

The grading criterion

Eventually a real failure will happen: a missed §64(d) cascade fires at first death; an estate gets reassessed in 2055. The tool’s reputation depends on whether the failure was visible in advance.

Grading happens at issuance · the long-term outcome is informative but not determinative