Legacy — Rule engine

The alphabet

Nine planning principles. Roughly thirty predicates derived from them. Each predicate is a query against the graph — not a habit, not a prompt, not a checklist someone has to remember.

PRINCIPLE 01

Full asset inventory

every_asset_has_owner · no orphan Asset nodes

every_asset_has_beneficiary_direction

no_undocumented_dispute_point

PRINCIPLE 02

Five risk checks

marriage_risk_modelled

intestacy_outcome_acceptable

creditor_pierce_modelled

tax_defensible_by_jurisdiction

simultaneous_death_survivable

PRINCIPLE 03

Tools match goals

tool_states_goal · every proposed tool names what it enforces

simpler_alternative_ruled_out

PRINCIPLE 04

Plans maintained

trigger_event_unprocessed · e.g., new asset, birth, citizenship change

advisor_transition_briefed

PRINCIPLE 05

Incapacity, not just death

durable_poa_in_every_jurisdiction

medical_authority_designated

no_single_signer_company · sole director ⇒ paralysis

PRINCIPLE 06

Liquidity at death

24mo_liquidity_covered · without forced sale

insurance_sized_to_estate_tax

no_double_use_of_liquid_asset

PRINCIPLE 07

Jurisdictional conflicts

governing_law_chosen_per_asset

will_valid_where_asset_sits

no_unspoken_intestacy_default

PRINCIPLE 08

Citizenship is hard

no_us_person_in_bvi_role

pfic_safe_inheritance_path

kyc_uses_compartment_passport

pending_immigration_treated_as_done

PRINCIPLE 09

Family before institutions

family_candidates_exhausted

institution_choice_justified

no_default_corporate_trustee

Principles 8 and 9 interact in a fixed order: citizenship gate first, then family-first. A family candidate who fails the citizenship check cannot fill the role regardless of preference. This ordering is itself a predicate — role_assignment_filters_p8_before_p9 — checked whenever the engine suggests a person for a role.

When the engine fires

Five hook points across the session lifecycle. The same predicate alphabet runs at each — the difference is what's being judged and what the engine does when it finds a problem.

T · SESSION START

Load context

Open items, recent decisions, what changed since last session, what needs principal attention.

T · ON INGEST

Validate write

New source ingested. Predicates fire on the new + adjacent nodes. Contradictions flagged on both pages.

T · ON QUERY

7-stage pipeline

Stages A–G run before any structural recommendation surfaces. See § pipeline below.

T · SESSION END

Crystallise

Conclusions, overrides, and reconciliations distilled into typed Analysis nodes.

T · SCHEDULED

Freshness sweep

Nightly background pass. Stale facts surfaced; low-confidence load-bearing facts flagged for refresh.

Same predicate alphabet · five different moments of evaluation

One rule, walked through

Principle 8 · no_us_person_in_bvi_role · firing when USLP submits the BVI HoldCo KYC pack for ingest, 2026-05-10.

GRAPH STATE

What the engine reads

VEHICLE BVI HoldCo

edge party_of (Dan Hu, shareholder, 50%)

edge party_of (Chenwen, shareholder, 50%)

Dan Hu citizenship: [CN, MT, GD]

Dan Hu tax_residency: NRA · conf 1.0

Chenwen citizenship: [CN, MT]

Chenwen tax_residency: NRA · conf 1.0

Nate Hu citizenship: [US, CN] · child · not in party_of

KYC pack passports submitted: MT251400, MT251461

→

PREDICATES

What the engine checks

no_us_person_in_bvi_role

For each party_of edge on a BVI vehicle, the party's citizenship must not contain US.

PASS Dan: no US

PASS Chenwen: no US

kyc_uses_compartment_passport

For BVI KYC of a Chinese-national principal, prefer Malta/Grenada passport. Chinese passport submission is a flag.

PASS Dan: MT251400 used

PASS Chenwen: MT251461 used

→

OUTCOME

What the principal sees

Briefing line: “14 entities read · 0 rules violated. Citizenship compartmentalisation satisfied.”

No prompt to the principal. The check ran silently. The pass-state is logged so a future audit can reconstruct which version of the predicate cleared which version of the graph.

If any predicate had failed, the engine would have paused the ingest, surfaced the failure, and asked: “Acknowledge, override with rationale, or repair?”

When the engine refuses to compute

Every fact carries a confidence score, a valid_from, and a last_verified. Load-bearing decisions (tax, KYC, deed) require fresh, high-confidence inputs. The engine knows what it's standing on.

Fact	Confidence	Source & date	Engine behaviour
Dan Hu · malta_passport · MT251400	1.00 · load-bearing	Passport scan · 2025-11	USE Eligible for BVI KYC. No prompt.
Nate Hu · us_citizenship	1.00 · load-bearing	US birth cert · 2011	USE PFIC predicate runs against this fact.
Pomelo · expected_birth_year	0.50 · owner intent only	Conversation · 2026-03	SHOW + TOOLTIP Used for forward-look planning. Hover surfaces “owner intent only.”
Tao Qing · current_address	0.30 · stale 2019	Court file · 2019-02-27	REFRESH REQUESTED Not load-bearing for current tasks. Flagged in nightly sweep.
Nate Hu · mailing_address · for Form 3520	0.65 · below threshold	Tao Qing self-report · 2024	REFUSE TO COMPUTE Form 3520 amendment requires conf ≥ 0.95. Engine pauses; principal must confirm a current address before drafting.

The threshold is not arbitrary. It comes from the cost of being wrong: a passport number used for KYC at the wrong confidence is a fileable mistake; an heir’s mailing address used for an IRS form at the wrong confidence is a fileable mistake. The engine refuses, not because it doesn’t know how, but because the source isn’t strong enough.

Note · two distinct confidence systems. Per-fact confidence (above) gates whether the engine will compute on a fact. Per-recommendation confidence (next section) gates whether the principal should act on a conclusion. Same scale, different unit of analysis.

Recommendation confidence bands

Every structural recommendation carries a number. The number means something specific. Bands map to action permissions.

95–100%

Safe-harbor SAFE-HARBOR

Verified by primary authority or professional sign-off. Suitable to act on without further review. Examples: a deed transfer where the LSA is signed and Boone has confirmed the federal classification.

80–94%

Well-grounded WELL-GROUND

Solid analytical position; not yet formally signed off. Close to safe-harbor and typically suitable to begin executing pre-action steps (KYC packs, intake forms) while sign-off arrives.

65–79%

Defensible DEFENSIBLE

Position can be defended on the merits but depends on a contested or look-through claim. Requires professional sign-off before action. The one-step §351 recommendation lives here at 65%.

40–64%

Judgment call JUDGMENT

Multiple competing analyses are roughly balanced. Cannot act on this alone. Surface as a decision-required item; the principal chooses with full visibility into the trade-off.

<40%

Speculative SPECULATIVE

Not a recommendation. Mark as a research question; surface as a gap to investigate, not as an action to take. Rev. Proc. 2002-69 hybrid for NRA spouses sits here at ∼40%.

⚠ SWING RULE

A confidence number that changes by more than 15 points in response to a user critique without a verified new fact is itself a calibration bug. The engine flags it: “Confidence swing without facts — check whether this is response to user framing rather than analysis.” See self-flag ⚠ CONFIDENCE SWING WITHOUT FACTS in the next section.

The seven-stage pre-recommendation pipeline

Every structural recommendation runs through Stages A–G before surfacing. Skipping a stage is a bug. Each stage emits an artifact — the engine cannot show a recommendation that lacks any artifact. Plus a final Stage Z (adversarial review pass) before display.

A
Knowledge retrieval
          CONTEXT MANIFEST
        

          Query the vault for: prior analyses on the same / adjacent question, rejected patterns in the domain, open questions on relevant pages, applicable jurisdictional rules at federal / state / foreign levels.
          VAULT EXAMPLE
          For the FIRPTA workstream: 12 prior analyses retrieved, 4 rejected patterns including [[bvi-structure-one-vs-two]], 3 open questions on [[firpta-tax-mitigation-to-do]], jurisdictional sweep across IRS / CA SBE / BVI Companies Act.
        
Context manifestretrieved + rejected + open
B
Time-state walkthrough
          LIFECYCLE · CLASSIFICATION
        

          For each major lifecycle moment (formation, operation, structural events, succession, principal death), produce a one-line description of: legal classification, ownership, federal + state + foreign tax classification, threshold tests crossed.
          VAULT EXAMPLE
          Two-step LLC plan: at Tstep-A, LLC is multi-member ⇒ partnership for federal tax. At Tstep-B, single-member ⇒ DRE again. Transient classification flips are flagged in red.
        
Chronological tableper-entity classification
C
Rejected-pattern test
          PRIOR ANALYSIS · APPLICATION
        

          For each pattern retrieved in Stage A, check whether the proposal recreates that pattern at any moment in the lifecycle. If yes & the rejection still applies: proposal is dead. If yes but a specific reason makes it non-applicable: state the reason. If no: state which patterns were checked.
          VAULT EXAMPLE
          Two-step recreates the multi-member-LLC partnership pattern that killed dual-BVI in [[bvi-structure-one-vs-two]]. Same federal mechanic. Same rejection still applies. Two-step is dead.
        
Pattern-match tablerecreate? still applies?
D
Strongest counter-argument
          ADVERSARIAL · PRE-AFFIRMATIVE
        

          Before writing the affirmative case, generate the most damaging counter-argument. Must include: an alternative structure, the alternative’s advantage, the recommendation’s disadvantage under the alternative’s lens. If no strong counter is generated, the problem isn’t understood deeply enough — return to Stage A.
          VAULT EXAMPLE
          For one-step §351, the counter is §62(a)(2) look-through failure. Surfaced before the one-step affirmative case is composed — not after the principal pushes back.
        
Counter-argumentfiled alongside
E
Quantitative validation
          SPECIFICITY · COMPUTATION
        

          Every numeric claim must be backed by: shown computation, citation to vault analysis or external authority, or explicit “rough estimate” qualifier. Comparative claims (“much higher”, “significantly”) are bugs without underlying numbers. No retrofitted claims.
          VAULT EXAMPLE
          Phantom “$50K+/yr long-horizon cost” from one-step look-through failure had no year-by-year math. Stage E required the table; the table showed Scenario B is cheaper, not costlier. The claim was wrong.
        
Numbered claim listeach with backing
F
Conflation check
          SEMANTICS · SHARED-NAME PHENOMENA
        

          For each load-bearing term, check whether it maps to multiple distinct phenomena. If so, enumerate separately and verify which the claim attaches to. Examples: “reassessment”, “step transfer”, “DRE”, “principal”.
          VAULT EXAMPLE
          “Reassessment” can fire at §62(a)(2) (contribution date), §64(d) (first death), or §64(c) (entity-level change). Same word, three distinct events. The engine forces them apart on every page that uses the bare term.
        
Glossary tableterm ↔ meaning ↔ claim
G
Confidence + gap statement
          CALIBRATION · SIGN-OFF MAP
        

          Produce a confidence number for the recommendation as a whole AND for each major sub-claim. Each number is accompanied by: what’s verified, what’s NOT verified, which professional’s sign-off would resolve each unverified item, the cost of being wrong on each item.
          VAULT EXAMPLE
          One-step §351 lands at 65% overall: 90% on the partnership-window finding, 60% on the look-through, 95% on the bounded-downside math, 85% on §64(d) structure-independence. Boone unblocks the federal items; Yiqi unblocks the CA items.
        
Confidence tableper sub-claim & overall
Z
Adversarial review pass
          FINAL GATE · BEFORE DISPLAY
        

          After Stages A–G complete, the engine runs a final adversarial pass: “Imagine you’re a specialist whose job is to find the flaw in this recommendation. You have full vault access. Where do you look first?” If a real flaw surfaces, return to Stage A. If only weak challenges do, proceed and report inline: “The strongest challenge I generated was X. It is addressed by Y.” A recommendation that produces no challenges in this pass is suspect — either the problem is genuinely shallow or the engine hasn’t pushed hard enough.
        
Inline adversarialstrongest + addressed

All seven stages plus Stage Z ran on the FIRPTA recommendation v2 (see Recommendation). Stages A and C caught the partnership-window flaw that a human reviewer had missed across three turns. Stages D and E caught a fabricated $50K+/yr claim that had been inserted to balance the recommendation. The engine doesn’t promise correct answers; it promises that the failures will be the kind a reviewer can see.

The six self-flags — visible to the user, inline

Self-flagging is a feature, not a sign of unreliability. It’s the alternative to silent failure. These chips appear inline in the engine’s output, not in a hidden audit log. The user sees them; the next engine run sees them.

⚠ QUANTITATIVE WITHOUT COMPUTATION

A number was produced without inline computation or citation

Fires when a dollar amount, percentage, time period, or comparative magnitude is asserted without one of: shown computation, citation to vault / external authority, or explicit “rough estimate” tag.

Source incident: phantom “$50K+/yr long-horizon cost” without backing math (reflection.md, T3).

⚠ AFFIRMATIVE WITHOUT COUNTER-ARG

The pro-recommendation case was written before Stage D ran

Fires when affirmative-case prose appears in the output and the Stage D counter-argument artifact is missing, empty, or generated post-hoc. The order matters — counter-argument first, affirmative case second.

Source incident: “Why two steps?” affirmative callout written before the partnership-window counter was considered (reflection.md, T1).

⚠ TERM WITHOUT DISAMBIGUATION

A load-bearing term with multiple meanings was used unqualified

Fires when terms with known multiple referents (“reassessment”, “contribution”, “step”, “transfer”, “DRE”, “NRA”, “principal”) appear without specifying which sub-concept. Stage F should have caught it; if it’s in the output anyway, the flag fires.

Source incident: “reassessment” conflated §62(a)(2) with §64(d), producing wrong-magnitude analysis (reflection.md, T4).

⚠ CONFIDENCE SWING WITHOUT FACTS

A confidence number moved by more than 15 points without a verified new fact

Fires when a recommendation’s confidence delta between turns exceeds 15 points and the change is not traceable to a Stage A retrieval result, a Stage E computation, or a professional sign-off.

Source incident: confidence in two-step swung from “working plan” (implicit ~80%) to “unambiguously worse” (implicit ~20%) on critique pressure alone (reflection.md, T2→T3).

⚠ RESPONDING TO USER FRAMING

The output is structured around the user’s last move rather than facts

Fires when the engine’s response shape mirrors the critique’s vector (e.g., critique pushes toward A ⇒ engine pivots toward A) without verified new facts that justify the pivot. Detected by comparing the output’s argument structure to the critique’s framing.

Source incident: four positions in five turns, each pivoting on user framing rather than independent verification (reflection.md, root cause B).

⚠ NOVEL POSITION AS SAFE-HARBOR

An untested authority is presented without the novel-position flag

Fires when a position depends on authority that lacks a published pronouncement on the specific factual pattern (e.g., Rev. Proc. 2002-69 for NRA spouses) but the output presents it as established practice rather than as a position with failure modes.

Source incident: Rev. Proc. 2002-69 hybrid initially proposed without flagging the NRA-application novelty — corrected in v2.

Each flag is rendered as the chip pattern shown above when fired. When cleared (the check ran and passed), it shows as ✓ CHECK CLEARED at the bottom of the recommendation. Visibility, not silence.

One failure, with and without the pipeline

A four-turn correction cycle on the FIRPTA two-step proposal · what happened · what the seven-stage pipeline + self-flags would have shortened.

WITHOUT THE PIPELINE

What actually happened · 2026-05-14 to 05-15

T1Two-step §351 proposed. Affirmative case written: “cleaner for CA Prop-13 / DTT mechanics.”
T2Principal critiques: same logic that rejected dual-BVI rejects two-step’s transitional window. Partnership classification at T_step-A missed.
T3Overcorrection: “one-step is unambiguously better.” New phantom drawback fabricated — “$50K+/yr long-horizon cost” — with no underlying math.
T4Principal: “think harder.” Math still not done. Phantom number repeated.
T5Principal asks for the math. Year-by-year table is built. Phantom drawback is wrong; one-step is in fact cleaner.
Five-turn cycle. Confidence swung wildly. The principal did the analytical work.

WITH THE PIPELINE

What would have surfaced at T1

AStage A retrieves [[bvi-structure-one-vs-two]] as a rejected pattern; multi-member-LLC partnership-treatment risk is on the working page.
BStage B time-state walkthrough labels the LLC at T_step-A as “multi-member ⇒ partnership for federal tax.”
CStage C rejected-pattern test fires — the rejection still applies. Two-step proposal is dead at T1.
DCounter-argument for one-step: §62(a)(2) look-through failure surfaced before affirmative case is written.
EStage E demands year-by-year math. Phantom claim doesn’t pass. Real math: Scenario B cheaper at every horizon.
FStage F separates “reassessment” into §62(a)(2) and §64(d) events with their own bases and timings.
GStage G outputs 65% overall confidence with sign-off map: Boone for federal, Yiqi for CA.
All stages pass. Recommendation lands as a three-option matrix at T1 — no four-turn cycle.

“You can be completely wrong if you get all logics right but missed one key point or key logic in the chain. Interrogate yourself.” — Principal critique, 2026-05-15. Recorded in reflection.md and now compiled into the predicate set, the pipeline, and the self-flags above.

What the engine never does

Hard refusals. These aren’t preferences — they are structural commitments that mirror the vault’s “What I Never Do” protocol.

REFUSAL · OVERWRITE

Silently overwrite a contradicting fact

A new source that contradicts an existing fact triggers a contradiction flag on both pages. Neither is silently replaced. The principal resolves; the resolution becomes a typed Analysis node.

REFUSAL · STALE INPUT

Compute on stale load-bearing facts

Any computation that drives a tax filing, KYC submission, or deed transfer requires conf ≥ 0.95 on every input. Below threshold, the engine pauses and asks for a refresh — not a guess.

REFUSAL · UNCHECKED RECOMMEND

Surface a recommendation that skipped a pipeline stage

Stages A–G + Z are write-time gates, not polish. A recommendation missing any stage’s artifact is not displayed — the engine returns an open question instead.

REFUSAL · SUPERSEDED REUSE

Cite a superseded page in active reasoning

Pages with status: superseded are still readable but not citable. The supersession edge is followed forward to the current version before any predicate evaluates against it.

REFUSAL · FABRICATED COUNTER

Invent a counter-argument to appear thoughtful

When asked “are you sure?” the engine re-runs Stage D honestly. If no real counter surfaces, it says so. Fake nuance is worse than visible confidence — ⚠ NOVEL POSITION AS SAFE-HARBOR would fire on a fabricated counter.

REFUSAL · CONFLATION

Treat two distinct events as one because they share a name

“Reassessment”, “trust”, “NRA”, “principal” — load-bearing terms with multiple referents. Each appearance is resolved to one referent before a predicate fires.