01 · LIFECYCLE

Confidence scoring per fact

Every property carries a 0–1 score for how strong its source citation is. The rule engine reads it; the principal sees it.

PROPERTY VALUE CONFIDENCE SOURCE / DATE primary_citizenship Chinese 1.0 passport · 2025 dob 1981-05-10 1.0 passport · 2025 malta_passport MT251400 1.0 passport · 2025 itin 992-88-9912 1.0 IRS CP565 · 2026-04 cv_risk elevated 0.85 physical · 2026-01 pomelo_birth_year 2027–2028 0.50 owner intent only tao_qing_address ??? 0.30 ⚠ stale 2019 janice_school ??? 0.20 ⚠ refresh needed rule-engine threshold for tax decisions: conf ≥ 0.95

Eight Dan Hu properties · confidence bars + source date

Every fact in the graph carries metadata: a confidence score (how strong is the citation), a valid_from, an optional valid_until, and a last_verified date. Confidence isn't decorative — the rule engine reads it.

High-confidence facts (≥0.95) are load-bearing: the tax simulator, the rules engine, and the briefing generator all use them directly. Medium-confidence facts (0.4–0.94) get a hover-tooltip showing the source so the user can judge. Low-confidence facts (<0.4) get a coral “refresh requested” flag — the system pauses before relying on them.

APPLIED HERE The Trust 1000 deed amendment for Nate's beneficiary status reads several facts: Dan's spouse, Nate's date of birth, Nate's US citizenship, the existing trust beneficiary list. Each carries its own confidence. If Nate's address (used for IRS Form 3520) drops below 0.7, the system refuses to compute the amendment until the principal confirms a current address.
02 · LIFECYCLE

Supersession as a first-class edge

When a will, trust deed, or POA is revised, the old version doesn't get overwritten. It gets explicitly superseded.

WILL · v1 · SUPERSEDED Dan BVI Will drafted 2026-05-09 standard substitute clause conf 0.7 · pre-PFIC check supersedes WILL · v2 · CURRENT Dan BVI Will revised 2026-05-10 + explicit PFIC carve-out for Nate conf 0.95 · USLP + BY&S reviewed valid_from 2026-05-10 will supersede WILL · v3 · PLANNED Dan BVI Will after Pomelo or Arc is born + after-born child clause draft trigger: live birth AUDIT TRAIL Every prior version is preserved with timestamps. The principal can see why v2 replaced v1 (PFIC reconciliation) and what triggered the change (BY&S review note 2026-05-10).

A typed chain of will revisions · supersedes is a first-class edge in the alphabet

Estate documents revise. Wills get rewritten when a child is born or a beneficiary dies. Trust deeds get amended when tax laws change. Powers of attorney expire and renew. The naive approach is to overwrite — but that loses history, and history is exactly what auditors and successor attorneys need.

With a supersedes edge in the alphabet, every revision creates a new node and links it to its predecessor. Old versions stay queryable; the “current” version is whichever has no outgoing supersedes edge.

APPLIED HERE Dan's BVI Will moves from v1 (standard substitute clause) to v2 (explicit PFIC carve-out for Nate, after USLP × BY&S reconciliation). v3 will be drafted when Pomelo or Arc is born. The decision log on the Professionals page shows v2's rationale; the supersession edge points to v1, which stays preserved.
03 · ARCHITECTURE

Memory consolidation tiers

Raw → episodic → semantic → procedural. Information moves up the stack as evidence accumulates.

PROCEDURAL rules + principles SEMANTIC typed entities people · assets · vehicles · trusts EPISODIC source pages + ingestion drafts one page per source document WORKING raw documents PDFs · images · spreadsheets in raw/ IN THIS PLAN 9 principles ~30 rule predicates 65 entities 41 typed edges 87 source pages 26 analyses ~90 raw files PDFs + images + XLS LIFESPAN permanent ~years ~weeks ~days

Four memory tiers · higher = more compressed, more confident, longer-lived

Not all information is the same age, the same confidence, or the same lifespan. A raw passport PDF is one thing; the parsed fact “Dan is a Maltese citizen as of 2025” is another; the rule “use Maltese passport for BVI KYC, never Chinese” is yet another. They live at different levels of the stack.

The architecture already implements this implicitly — raw documents in raw/, sources in wiki/sources/, entities in wiki/entities/, principles in principles.md. v2 gives the tiers names and a promotion model: facts move up the stack as evidence confirms them.

APPLIED HERE A document arrives in working memory (the raw PDF). The LLM creates an episodic source page. The principal confirms; the typed semantic entities are written. If a rule fires consistently across many simulations, it gets promoted into procedural memory as a new principle or predicate. Lower tiers can be pruned without losing the higher ones — raw files are append-only, source pages can be archived once their entities are confirmed.
04 · SCALE

Hybrid search: BM25 + vector + graph

Three search streams, fused. Each catches what the others miss.

"that Hong Kong company we considered last year" BM25 · KEYWORD finds "HK", "Hong Kong" exact + stemmed matches stems / synonyms / NER VECTOR · SEMANTIC offshore · jurisdiction · agent similar concepts, different words embeddings · cosine sim GRAPH · STRUCTURAL vehicles with country=HK typed-property walking located_in · jurisdiction RECIPROCAL RANK FUSION 1. Hawksford (BVI + HK + SG offices) 2. ICS Corporate Services (China-focused BVI agent)

One query · three search streams · fused result list

Past ~200 entities, a single index file becomes too big for any LLM to scan in one pass. You need real retrieval. Three streams complement each other: BM25 nails exact-term recall (find every page that says “Hong Kong”), vector embeddings nail semantic recall (find pages about offshore jurisdictions even without the word), graph traversal nails relational recall (find every vehicle whose located_in is HK).

Reciprocal rank fusion merges the three lists. A result that ranks high in any one stream rises; a result that ranks high in all three rises further.

APPLIED HERE The plan has ~25 entities today. Single index works fine. At scale (multi-family, advisor mode, or 5+ years of accumulated revisions), it won't — hybrid search is what keeps queries fast without the LLM having to read every page.
05 · AUTOMATION

Crystallization: completed work becomes structure

Every simulation, reconciliation, or decision auto-distills into a typed Analysis node.

SIMULATION RUN S1: Dan dies 2031 tax + distribution + liquidity computed RECONCILIATION PFIC conflict surfaced USLP × BY&S · principal resolved OWNER OVERRIDE Trust 1500 policyholder system suggested change · user refused CRYSTALLIZER distill → extract entities, conclusions, source spans, provenance ANALYSIS · CRYSTALLISED "S1 saves $14.5M" first-class node · permanent ANALYSIS · CRYSTALLISED "PFIC carve-out resolution" cites USLP draft v1 + BY&S note ANALYSIS · CRYSTALLISED "Why we kept Dan as policyholder" override rationale preserved

Work events → Crystallizer → typed Analysis nodes in the graph

Karpathy's original wiki said: “good answers can be filed back as new pages.” v2 takes this further. Every completed work product — a scenario simulation, a cross-professional reconciliation, an override decision — is itself a piece of knowledge worth keeping. The crystallizer auto-extracts the conclusion, the entities involved, the source spans, the rationale, and files it as a typed Analysis node.

This closes the loop: the system's own work product becomes input to future reasoning. The PFIC reconciliation gets cited the next time a similar conflict arises. The S1 simulation outcome informs scenario-3 planning. Reasoning compounds instead of evaporating.

APPLIED HERE The vault already does this manually — the wiki/analyses/ folder is full of crystallised analyses (Scenario 1 to-do, Growth Corp fund-term expiry, FIRPTA mitigation plan, etc.). The upgrade is making it automatic: every simulation run + every conflict resolution becomes an Analysis node without manual filing.
06 · AUTOMATION

Event-driven hooks beyond write-time

Rules fire on writes; v2 adds session-level and schedule-level automation too.

SESSION TIMELINE ON SESSION START load_relevant_context() recent decisions, open items t = 0 ON DOC INGEST extract → validate → rule check + dedup + contradiction check t = ingest ON QUERY run engine → log decision surface conflicts to principal t = query ON SESSION END crystallize_session() distill, file as analysis t = end ON SCHEDULE freshness + lint pass daily 03:00 · async t = +24h user-driven event scheduled / background

Five hook points · only one (write-time) is in the current rule-engine design

Our rule engine fires on writes. v2 expands that to five hook points:

On session start, load context relevant to recent activity — the principal walks into a briefing of what's open, what changed, what needs decision. On document ingest, the write-time rules we already have. On query, log the question + answer to the decision log; surface any conflict it touched. On session end, crystallize the session into typed Analysis nodes. On schedule, periodic background work: freshness check, lint pass, low-confidence refresh prompts.

APPLIED HERE Session-start context loading is especially valuable for advisor onboarding: a CPA logs in and gets a brief of what's open in their scope, what was decided since they last engaged, what needs their input. The scheduled freshness check runs nightly and pings the principal: "Tao Qing's address hasn't been re-verified since 2019 — confirm or update?"
07 · PRIVACY

PII filter on ingest, tiered by audience

The graph stores everything. The view filters at render time, per audience, per property.

SOURCE OF TRUTH · GRAPH Malta passport · MT251400 · expiry 2035-01-17 · holder Dan Hu PII FILTER · per audience, per property PRINCIPAL VIEW FULL MT251400 expiry 2035-01-17 + provenance trail USLP BRIEFING FULL (KYC) MT251400 expiry 2035-01-17 required for BVI agent BY&S BRIEFING PARTIAL (number redacted) nationality: MT only what tax needs PUBLIC SUMMARY MASKED ****1400 EU passport no identifying digits One fact in the graph. Four different rendered views. Compartmentalisation is enforced at render, not at storage.

One fact · four audience views · same graph underneath

The compartmentalisation we sketched on the Scopes page goes deeper than just “which entities does this professional see.” It applies per property per audience. A passport number is full-fidelity for USLP (they need it for KYC); the same fact is “Malta nationality, number redacted” for BY&S (they only need the jurisdiction for tax); for a public summary, it's masked to ****1400.

The graph stores the source of truth once. The filter at render time decides what each view exposes. The principal sees the full provenance trail, including which views received what, and when.

APPLIED HERE Reinforces the existing scope-fence model with a finer-grained per-property tier. A BY&S briefing for the 585 property includes the address, lease, rent, and §871(d) election — but redacts passport numbers, ITIN digits beyond the last four, and bank account suffixes. None of it requires re-engineering the graph; the change is purely at the render layer.
SKIP · DOES NOT TRANSLATE

Two v2 ideas we're not adopting

Both are right for their original domain. Neither fits estate planning cleanly.

RETENTION DECAY · EBBINGHAUS CURVE

Forgetting doesn't apply to estate facts

v2's retention decay model has facts fade over time unless reinforced. Legal estate facts don't fade — your will is your will until you supersede it; your citizenship status is a hard fact, not a gradual one.

We do want freshness markers (last_verified) on the operational facts that drift — phone numbers, addresses, custody arrangements, current school. Different mechanic, same need: surface what's stale; refuse to use it for high-stakes decisions without refresh.

MESH SYNC · MULTI-AGENT PARALLEL EDITS

Multi-agent sync is overkill

v2 handles concurrent writes from multiple AI agents working in parallel on the same wiki. For a single-family estate planner, this never happens — the principal is the only writer, the LLM is the only assistant.

It would become relevant if we open the model to multiple human advisors editing concurrently. But that contradicts the principal-led posture we set out with: advisors propose, the principal decides, the software brokers. No parallel editing needed.

PRIORITY · SEQUENCE

Three high-leverage, four nice-to-have

Not all upgrades are equal. Some are essential for the estate-planning case; others only matter at scale.

High-leverage — build first

essential for estate planning specifically
01
Confidence scoring Legal & tax decisions need to know which facts are load-bearing. Without it, the rule engine can't distinguish a known passport number from a guess.
02
Supersession edges Wills, trust deeds, POAs revise. Without explicit supersession, history is lost and auditors can't reconstruct what the principal actually decided when.
05
Crystallization Every simulation, reconciliation, and decision already produces structured output. Auto-filing them as Analysis nodes is nearly free and compounds knowledge over time.

Nice-to-have — add at scale

valuable but trigger only past certain thresholds
03
Consolidation tier naming Useful framing for documentation and onboarding new advisors. Not a code change — we already implement the tiers; v2 just gives them names.
04
Hybrid search Matters past ~200 entities. Today's plan has ~25. Add embeddings + BM25 when single-index queries get slow.
06
Broader event hooks Session-start context loading is a quick win for advisor onboarding. Scheduled freshness checks help with operational fact maintenance.
07
PII filter per audience Adds polish to the existing scope-fence model. Important when professional briefings get auto-generated and shared; not essential while the principal manually reviews each.
ROADMAP · IMPLEMENTATION SPECTRUM

Where we are. Where we're going.

v2's six-level ladder, with our progress marked.

L1
Minimal viable wiki
raw + wiki + index + schema (CLAUDE.md / principles.md)
done
L2
Add lifecycle
confidence scoring + supersession + freshness markers
in design
L3
Add structure
typed entities + typed edges + graph traversal queries
in design
L5
Add scale
hybrid search + tier naming + PII filter + scheduled background work
later
L6
Add collaboration
mesh sync + shared/private scoping + work coordination
skip
The schema document is the most important file in the system. It's what turns a generic LLM into a disciplined knowledge worker.
LLM Wiki v2 — applied here as principles.md + the typed alphabet