The alphabet

Nine planning principles. Roughly thirty predicates derived from them. Each predicate is a query against the graph — not a habit, not a prompt, not a checklist someone has to remember.

PRINCIPLE 01
Full asset inventory
every_asset_has_owner · no orphan Asset nodes
every_asset_has_beneficiary_direction
no_undocumented_dispute_point
PRINCIPLE 02
Five risk checks
marriage_risk_modelled
intestacy_outcome_acceptable
creditor_pierce_modelled
tax_defensible_by_jurisdiction
simultaneous_death_survivable
PRINCIPLE 03
Tools match goals
tool_states_goal · every proposed tool names what it enforces
simpler_alternative_ruled_out
PRINCIPLE 04
Plans maintained
trigger_event_unprocessed · e.g., new asset, birth, citizenship change
advisor_transition_briefed
PRINCIPLE 05
Incapacity, not just death
durable_poa_in_every_jurisdiction
medical_authority_designated
no_single_signer_company · sole director ⇒ paralysis
PRINCIPLE 06
Liquidity at death
24mo_liquidity_covered · without forced sale
insurance_sized_to_estate_tax
no_double_use_of_liquid_asset
PRINCIPLE 07
Jurisdictional conflicts
governing_law_chosen_per_asset
will_valid_where_asset_sits
no_unspoken_intestacy_default
PRINCIPLE 08
Citizenship is hard
no_us_person_in_bvi_role
pfic_safe_inheritance_path
kyc_uses_compartment_passport
pending_immigration_treated_as_done
PRINCIPLE 09
Family before institutions
family_candidates_exhausted
institution_choice_justified
no_default_corporate_trustee

Principles 8 and 9 interact in a fixed order: citizenship gate first, then family-first. A family candidate who fails the citizenship check cannot fill the role regardless of preference. This ordering is itself a predicate — role_assignment_filters_p8_before_p9 — checked whenever the engine suggests a person for a role.

When the engine fires

Five hook points across the session lifecycle. The same predicate alphabet runs at each — the difference is what's being judged and what the engine does when it finds a problem.

T · SESSION START
Load context
Open items, recent decisions, what changed since last session, what needs principal attention.
T · ON INGEST
Validate write
New source ingested. Predicates fire on the new + adjacent nodes. Contradictions flagged on both pages.
T · ON QUERY
7-stage pipeline
Stages A–G run before any structural recommendation surfaces. See § pipeline below.
T · SESSION END
Crystallise
Conclusions, overrides, and reconciliations distilled into typed Analysis nodes.
T · SCHEDULED
Freshness sweep
Nightly background pass. Stale facts surfaced; low-confidence load-bearing facts flagged for refresh.
Same predicate alphabet · five different moments of evaluation

One rule, walked through

Principle 8 · no_us_person_in_bvi_role · firing when USLP submits the BVI HoldCo KYC pack for ingest, 2026-05-10.

GRAPH STATE

What the engine reads

VEHICLE BVI HoldCo
edge party_of (Dan Hu, shareholder, 50%)
edge party_of (Chenwen, shareholder, 50%)
Dan Hu citizenship: [CN, MT, GD]
Dan Hu tax_residency: NRA · conf 1.0
Chenwen citizenship: [CN, MT]
Chenwen tax_residency: NRA · conf 1.0
Nate Hu citizenship: [US, CN] · child · not in party_of
KYC pack passports submitted: MT251400, MT251461
PREDICATES

What the engine checks

no_us_person_in_bvi_role

For each party_of edge on a BVI vehicle, the party's citizenship must not contain US.

PASS Dan: no US
PASS Chenwen: no US

kyc_uses_compartment_passport

For BVI KYC of a Chinese-national principal, prefer Malta/Grenada passport. Chinese passport submission is a flag.

PASS Dan: MT251400 used
PASS Chenwen: MT251461 used
OUTCOME

What the principal sees

Briefing line: “14 entities read · 0 rules violated. Citizenship compartmentalisation satisfied.”

No prompt to the principal. The check ran silently. The pass-state is logged so a future audit can reconstruct which version of the predicate cleared which version of the graph.

If any predicate had failed, the engine would have paused the ingest, surfaced the failure, and asked: “Acknowledge, override with rationale, or repair?”

When the engine refuses to compute

Every fact carries a confidence score, a valid_from, and a last_verified. Load-bearing decisions (tax, KYC, deed) require fresh, high-confidence inputs. The engine knows what it's standing on.

Fact Confidence Source & date Engine behaviour
Dan Hu · malta_passport · MT251400
1.00 · load-bearing
Passport scan · 2025-11 USE
Eligible for BVI KYC. No prompt.
Nate Hu · us_citizenship
1.00 · load-bearing
US birth cert · 2011 USE
PFIC predicate runs against this fact.
Pomelo · expected_birth_year
0.50 · owner intent only
Conversation · 2026-03 SHOW + TOOLTIP
Used for forward-look planning. Hover surfaces “owner intent only.”
Tao Qing · current_address
0.30 · stale 2019
Court file · 2019-02-27 REFRESH REQUESTED
Not load-bearing for current tasks. Flagged in nightly sweep.
Nate Hu · mailing_address · for Form 3520
0.65 · below threshold
Tao Qing self-report · 2024 REFUSE TO COMPUTE
Form 3520 amendment requires conf ≥ 0.95. Engine pauses; principal must confirm a current address before drafting.

The threshold is not arbitrary. It comes from the cost of being wrong: a passport number used for KYC at the wrong confidence is a fileable mistake; an heir’s mailing address used for an IRS form at the wrong confidence is a fileable mistake. The engine refuses, not because it doesn’t know how, but because the source isn’t strong enough.

Note · two distinct confidence systems. Per-fact confidence (above) gates whether the engine will compute on a fact. Per-recommendation confidence (next section) gates whether the principal should act on a conclusion. Same scale, different unit of analysis.

Recommendation confidence bands

Every structural recommendation carries a number. The number means something specific. Bands map to action permissions.

95–100%
Safe-harbor SAFE-HARBOR
Verified by primary authority or professional sign-off. Suitable to act on without further review. Examples: a deed transfer where the LSA is signed and Boone has confirmed the federal classification.
80–94%
Well-grounded WELL-GROUND
Solid analytical position; not yet formally signed off. Close to safe-harbor and typically suitable to begin executing pre-action steps (KYC packs, intake forms) while sign-off arrives.
65–79%
Defensible DEFENSIBLE
Position can be defended on the merits but depends on a contested or look-through claim. Requires professional sign-off before action. The one-step §351 recommendation lives here at 65%.
40–64%
Judgment call JUDGMENT
Multiple competing analyses are roughly balanced. Cannot act on this alone. Surface as a decision-required item; the principal chooses with full visibility into the trade-off.
<40%
Speculative SPECULATIVE
Not a recommendation. Mark as a research question; surface as a gap to investigate, not as an action to take. Rev. Proc. 2002-69 hybrid for NRA spouses sits here at ∼40%.
⚠ SWING RULE
A confidence number that changes by more than 15 points in response to a user critique without a verified new fact is itself a calibration bug. The engine flags it: “Confidence swing without facts — check whether this is response to user framing rather than analysis.” See self-flag ⚠ CONFIDENCE SWING WITHOUT FACTS in the next section.

The seven-stage pre-recommendation pipeline

Every structural recommendation runs through Stages A–G before surfacing. Skipping a stage is a bug. Each stage emits an artifact — the engine cannot show a recommendation that lacks any artifact. Plus a final Stage Z (adversarial review pass) before display.

A
Knowledge retrieval CONTEXT MANIFEST
Query the vault for: prior analyses on the same / adjacent question, rejected patterns in the domain, open questions on relevant pages, applicable jurisdictional rules at federal / state / foreign levels. VAULT EXAMPLE For the FIRPTA workstream: 12 prior analyses retrieved, 4 rejected patterns including [[bvi-structure-one-vs-two]], 3 open questions on [[firpta-tax-mitigation-to-do]], jurisdictional sweep across IRS / CA SBE / BVI Companies Act.
Context manifestretrieved + rejected + open
B
Time-state walkthrough LIFECYCLE · CLASSIFICATION
For each major lifecycle moment (formation, operation, structural events, succession, principal death), produce a one-line description of: legal classification, ownership, federal + state + foreign tax classification, threshold tests crossed. VAULT EXAMPLE Two-step LLC plan: at Tstep-A, LLC is multi-member ⇒ partnership for federal tax. At Tstep-B, single-member ⇒ DRE again. Transient classification flips are flagged in red.
Chronological tableper-entity classification
C
Rejected-pattern test PRIOR ANALYSIS · APPLICATION
For each pattern retrieved in Stage A, check whether the proposal recreates that pattern at any moment in the lifecycle. If yes & the rejection still applies: proposal is dead. If yes but a specific reason makes it non-applicable: state the reason. If no: state which patterns were checked. VAULT EXAMPLE Two-step recreates the multi-member-LLC partnership pattern that killed dual-BVI in [[bvi-structure-one-vs-two]]. Same federal mechanic. Same rejection still applies. Two-step is dead.
Pattern-match tablerecreate? still applies?
D
Strongest counter-argument ADVERSARIAL · PRE-AFFIRMATIVE
Before writing the affirmative case, generate the most damaging counter-argument. Must include: an alternative structure, the alternative’s advantage, the recommendation’s disadvantage under the alternative’s lens. If no strong counter is generated, the problem isn’t understood deeply enough — return to Stage A. VAULT EXAMPLE For one-step §351, the counter is §62(a)(2) look-through failure. Surfaced before the one-step affirmative case is composed — not after the principal pushes back.
Counter-argumentfiled alongside
E
Quantitative validation SPECIFICITY · COMPUTATION
Every numeric claim must be backed by: shown computation, citation to vault analysis or external authority, or explicit “rough estimate” qualifier. Comparative claims (“much higher”, “significantly”) are bugs without underlying numbers. No retrofitted claims. VAULT EXAMPLE Phantom “$50K+/yr long-horizon cost” from one-step look-through failure had no year-by-year math. Stage E required the table; the table showed Scenario B is cheaper, not costlier. The claim was wrong.
Numbered claim listeach with backing
F
Conflation check SEMANTICS · SHARED-NAME PHENOMENA
For each load-bearing term, check whether it maps to multiple distinct phenomena. If so, enumerate separately and verify which the claim attaches to. Examples: “reassessment”, “step transfer”, “DRE”, “principal”. VAULT EXAMPLE “Reassessment” can fire at §62(a)(2) (contribution date), §64(d) (first death), or §64(c) (entity-level change). Same word, three distinct events. The engine forces them apart on every page that uses the bare term.
Glossary tableterm ↔ meaning ↔ claim
G
Confidence + gap statement CALIBRATION · SIGN-OFF MAP
Produce a confidence number for the recommendation as a whole AND for each major sub-claim. Each number is accompanied by: what’s verified, what’s NOT verified, which professional’s sign-off would resolve each unverified item, the cost of being wrong on each item. VAULT EXAMPLE One-step §351 lands at 65% overall: 90% on the partnership-window finding, 60% on the look-through, 95% on the bounded-downside math, 85% on §64(d) structure-independence. Boone unblocks the federal items; Yiqi unblocks the CA items.
Confidence tableper sub-claim & overall
Z
Adversarial review pass FINAL GATE · BEFORE DISPLAY
After Stages A–G complete, the engine runs a final adversarial pass: “Imagine you’re a specialist whose job is to find the flaw in this recommendation. You have full vault access. Where do you look first?” If a real flaw surfaces, return to Stage A. If only weak challenges do, proceed and report inline: “The strongest challenge I generated was X. It is addressed by Y.” A recommendation that produces no challenges in this pass is suspect — either the problem is genuinely shallow or the engine hasn’t pushed hard enough.
Inline adversarialstrongest + addressed

All seven stages plus Stage Z ran on the FIRPTA recommendation v2 (see Recommendation). Stages A and C caught the partnership-window flaw that a human reviewer had missed across three turns. Stages D and E caught a fabricated $50K+/yr claim that had been inserted to balance the recommendation. The engine doesn’t promise correct answers; it promises that the failures will be the kind a reviewer can see.

The six self-flags — visible to the user, inline

Self-flagging is a feature, not a sign of unreliability. It’s the alternative to silent failure. These chips appear inline in the engine’s output, not in a hidden audit log. The user sees them; the next engine run sees them.

⚠ QUANTITATIVE WITHOUT COMPUTATION

A number was produced without inline computation or citation

Fires when a dollar amount, percentage, time period, or comparative magnitude is asserted without one of: shown computation, citation to vault / external authority, or explicit “rough estimate” tag.

Source incident: phantom “$50K+/yr long-horizon cost” without backing math (reflection.md, T3).
⚠ AFFIRMATIVE WITHOUT COUNTER-ARG

The pro-recommendation case was written before Stage D ran

Fires when affirmative-case prose appears in the output and the Stage D counter-argument artifact is missing, empty, or generated post-hoc. The order matters — counter-argument first, affirmative case second.

Source incident: “Why two steps?” affirmative callout written before the partnership-window counter was considered (reflection.md, T1).
⚠ TERM WITHOUT DISAMBIGUATION

A load-bearing term with multiple meanings was used unqualified

Fires when terms with known multiple referents (“reassessment”, “contribution”, “step”, “transfer”, “DRE”, “NRA”, “principal”) appear without specifying which sub-concept. Stage F should have caught it; if it’s in the output anyway, the flag fires.

Source incident: “reassessment” conflated §62(a)(2) with §64(d), producing wrong-magnitude analysis (reflection.md, T4).
⚠ CONFIDENCE SWING WITHOUT FACTS

A confidence number moved by more than 15 points without a verified new fact

Fires when a recommendation’s confidence delta between turns exceeds 15 points and the change is not traceable to a Stage A retrieval result, a Stage E computation, or a professional sign-off.

Source incident: confidence in two-step swung from “working plan” (implicit ~80%) to “unambiguously worse” (implicit ~20%) on critique pressure alone (reflection.md, T2→T3).
⚠ RESPONDING TO USER FRAMING

The output is structured around the user’s last move rather than facts

Fires when the engine’s response shape mirrors the critique’s vector (e.g., critique pushes toward A ⇒ engine pivots toward A) without verified new facts that justify the pivot. Detected by comparing the output’s argument structure to the critique’s framing.

Source incident: four positions in five turns, each pivoting on user framing rather than independent verification (reflection.md, root cause B).
⚠ NOVEL POSITION AS SAFE-HARBOR

An untested authority is presented without the novel-position flag

Fires when a position depends on authority that lacks a published pronouncement on the specific factual pattern (e.g., Rev. Proc. 2002-69 for NRA spouses) but the output presents it as established practice rather than as a position with failure modes.

Source incident: Rev. Proc. 2002-69 hybrid initially proposed without flagging the NRA-application novelty — corrected in v2.

Each flag is rendered as the chip pattern shown above when fired. When cleared (the check ran and passed), it shows as ✓ CHECK CLEARED at the bottom of the recommendation. Visibility, not silence.

One failure, with and without the pipeline

A four-turn correction cycle on the FIRPTA two-step proposal · what happened · what the seven-stage pipeline + self-flags would have shortened.

WITHOUT THE PIPELINE

What actually happened · 2026-05-14 to 05-15

  1. T1Two-step §351 proposed. Affirmative case written: “cleaner for CA Prop-13 / DTT mechanics.”
  2. T2Principal critiques: same logic that rejected dual-BVI rejects two-step’s transitional window. Partnership classification at Tstep-A missed.
  3. T3Overcorrection: “one-step is unambiguously better.” New phantom drawback fabricated — “$50K+/yr long-horizon cost” — with no underlying math.
  4. T4Principal: “think harder.” Math still not done. Phantom number repeated.
  5. T5Principal asks for the math. Year-by-year table is built. Phantom drawback is wrong; one-step is in fact cleaner.
  6. Five-turn cycle. Confidence swung wildly. The principal did the analytical work.
WITH THE PIPELINE

What would have surfaced at T1

  1. AStage A retrieves [[bvi-structure-one-vs-two]] as a rejected pattern; multi-member-LLC partnership-treatment risk is on the working page.
  2. BStage B time-state walkthrough labels the LLC at Tstep-A as “multi-member ⇒ partnership for federal tax.”
  3. CStage C rejected-pattern test fires — the rejection still applies. Two-step proposal is dead at T1.
  4. DCounter-argument for one-step: §62(a)(2) look-through failure surfaced before affirmative case is written.
  5. EStage E demands year-by-year math. Phantom claim doesn’t pass. Real math: Scenario B cheaper at every horizon.
  6. FStage F separates “reassessment” into §62(a)(2) and §64(d) events with their own bases and timings.
  7. GStage G outputs 65% overall confidence with sign-off map: Boone for federal, Yiqi for CA.
  8. All stages pass. Recommendation lands as a three-option matrix at T1 — no four-turn cycle.

“You can be completely wrong if you get all logics right but missed one key point or key logic in the chain. Interrogate yourself.” — Principal critique, 2026-05-15. Recorded in reflection.md and now compiled into the predicate set, the pipeline, and the self-flags above.

What the engine never does

Hard refusals. These aren’t preferences — they are structural commitments that mirror the vault’s “What I Never Do” protocol.

REFUSAL · OVERWRITE

Silently overwrite a contradicting fact

A new source that contradicts an existing fact triggers a contradiction flag on both pages. Neither is silently replaced. The principal resolves; the resolution becomes a typed Analysis node.

REFUSAL · STALE INPUT

Compute on stale load-bearing facts

Any computation that drives a tax filing, KYC submission, or deed transfer requires conf ≥ 0.95 on every input. Below threshold, the engine pauses and asks for a refresh — not a guess.

REFUSAL · UNCHECKED RECOMMEND

Surface a recommendation that skipped a pipeline stage

Stages A–G + Z are write-time gates, not polish. A recommendation missing any stage’s artifact is not displayed — the engine returns an open question instead.

REFUSAL · SUPERSEDED REUSE

Cite a superseded page in active reasoning

Pages with status: superseded are still readable but not citable. The supersession edge is followed forward to the current version before any predicate evaluates against it.

REFUSAL · FABRICATED COUNTER

Invent a counter-argument to appear thoughtful

When asked “are you sure?” the engine re-runs Stage D honestly. If no real counter surfaces, it says so. Fake nuance is worse than visible confidence — ⚠ NOVEL POSITION AS SAFE-HARBOR would fire on a fabricated counter.

REFUSAL · CONFLATION

Treat two distinct events as one because they share a name

“Reassessment”, “trust”, “NRA”, “principal” — load-bearing terms with multiple referents. Each appearance is resolved to one referent before a predicate fires.