Legacy — Upgrades (informed by LLM Wiki v2)

01 · LIFECYCLE

Confidence scoring per fact

Every property carries a 0–1 score for how strong its source citation is. The rule engine reads it; the principal sees it.

Eight Dan Hu properties · confidence bars + source date

Every fact in the graph carries metadata: a confidence score (how strong is the citation), a valid_from, an optional valid_until, and a last_verified date. Confidence isn't decorative — the rule engine reads it.

High-confidence facts (≥0.95) are load-bearing: the tax simulator, the rules engine, and the briefing generator all use them directly. Medium-confidence facts (0.4–0.94) get a hover-tooltip showing the source so the user can judge. Low-confidence facts (<0.4) get a coral “refresh requested” flag — the system pauses before relying on them.

APPLIED HERE The Trust 1000 deed amendment for Nate's beneficiary status reads several facts: Dan's spouse, Nate's date of birth, Nate's US citizenship, the existing trust beneficiary list. Each carries its own confidence. If Nate's address (used for IRS Form 3520) drops below 0.7, the system refuses to compute the amendment until the principal confirms a current address.

02 · LIFECYCLE

Supersession as a first-class edge

When a will, trust deed, or POA is revised, the old version doesn't get overwritten. It gets explicitly superseded.

A typed chain of will revisions · supersedes is a first-class edge in the alphabet

Estate documents revise. Wills get rewritten when a child is born or a beneficiary dies. Trust deeds get amended when tax laws change. Powers of attorney expire and renew. The naive approach is to overwrite — but that loses history, and history is exactly what auditors and successor attorneys need.

With a supersedes edge in the alphabet, every revision creates a new node and links it to its predecessor. Old versions stay queryable; the “current” version is whichever has no outgoing supersedes edge.

APPLIED HERE Dan's BVI Will moves from v1 (standard substitute clause) to v2 (explicit PFIC carve-out for Nate, after USLP × BY&S reconciliation). v3 will be drafted when Pomelo or Arc is born. The decision log on the Professionals page shows v2's rationale; the supersession edge points to v1, which stays preserved.

03 · ARCHITECTURE

Memory consolidation tiers

Raw → episodic → semantic → procedural. Information moves up the stack as evidence accumulates.

Four memory tiers · higher = more compressed, more confident, longer-lived

Not all information is the same age, the same confidence, or the same lifespan. A raw passport PDF is one thing; the parsed fact “Dan is a Maltese citizen as of 2025” is another; the rule “use Maltese passport for BVI KYC, never Chinese” is yet another. They live at different levels of the stack.

The architecture already implements this implicitly — raw documents in raw/, sources in wiki/sources/, entities in wiki/entities/, principles in principles.md. v2 gives the tiers names and a promotion model: facts move up the stack as evidence confirms them.

APPLIED HERE A document arrives in working memory (the raw PDF). The LLM creates an episodic source page. The principal confirms; the typed semantic entities are written. If a rule fires consistently across many simulations, it gets promoted into procedural memory as a new principle or predicate. Lower tiers can be pruned without losing the higher ones — raw files are append-only, source pages can be archived once their entities are confirmed.

04 · SCALE

Hybrid search: BM25 + vector + graph

Three search streams, fused. Each catches what the others miss.

One query · three search streams · fused result list

Past ~200 entities, a single index file becomes too big for any LLM to scan in one pass. You need real retrieval. Three streams complement each other: BM25 nails exact-term recall (find every page that says “Hong Kong”), vector embeddings nail semantic recall (find pages about offshore jurisdictions even without the word), graph traversal nails relational recall (find every vehicle whose located_in is HK).

Reciprocal rank fusion merges the three lists. A result that ranks high in any one stream rises; a result that ranks high in all three rises further.

APPLIED HERE The plan has ~25 entities today. Single index works fine. At scale (multi-family, advisor mode, or 5+ years of accumulated revisions), it won't — hybrid search is what keeps queries fast without the LLM having to read every page.

05 · AUTOMATION

Crystallization: completed work becomes structure

Every simulation, reconciliation, or decision auto-distills into a typed Analysis node.

Work events → Crystallizer → typed Analysis nodes in the graph

Karpathy's original wiki said: “good answers can be filed back as new pages.” v2 takes this further. Every completed work product — a scenario simulation, a cross-professional reconciliation, an override decision — is itself a piece of knowledge worth keeping. The crystallizer auto-extracts the conclusion, the entities involved, the source spans, the rationale, and files it as a typed Analysis node.

This closes the loop: the system's own work product becomes input to future reasoning. The PFIC reconciliation gets cited the next time a similar conflict arises. The S1 simulation outcome informs scenario-3 planning. Reasoning compounds instead of evaporating.

APPLIED HERE The vault already does this manually — the wiki/analyses/ folder is full of crystallised analyses (Scenario 1 to-do, Growth Corp fund-term expiry, FIRPTA mitigation plan, etc.). The upgrade is making it automatic: every simulation run + every conflict resolution becomes an Analysis node without manual filing.

06 · AUTOMATION

Event-driven hooks beyond write-time

Rules fire on writes; v2 adds session-level and schedule-level automation too.

Five hook points · only one (write-time) is in the current rule-engine design

Our rule engine fires on writes. v2 expands that to five hook points:

On session start, load context relevant to recent activity — the principal walks into a briefing of what's open, what changed, what needs decision. On document ingest, the write-time rules we already have. On query, log the question + answer to the decision log; surface any conflict it touched. On session end, crystallize the session into typed Analysis nodes. On schedule, periodic background work: freshness check, lint pass, low-confidence refresh prompts.

APPLIED HERE Session-start context loading is especially valuable for advisor onboarding: a CPA logs in and gets a brief of what's open in their scope, what was decided since they last engaged, what needs their input. The scheduled freshness check runs nightly and pings the principal: "Tao Qing's address hasn't been re-verified since 2019 — confirm or update?"

07 · PRIVACY

PII filter on ingest, tiered by audience

The graph stores everything. The view filters at render time, per audience, per property.

One fact · four audience views · same graph underneath

The compartmentalisation we sketched on the Scopes page goes deeper than just “which entities does this professional see.” It applies per property per audience. A passport number is full-fidelity for USLP (they need it for KYC); the same fact is “Malta nationality, number redacted” for BY&S (they only need the jurisdiction for tax); for a public summary, it's masked to ****1400.

The graph stores the source of truth once. The filter at render time decides what each view exposes. The principal sees the full provenance trail, including which views received what, and when.

APPLIED HERE Reinforces the existing scope-fence model with a finer-grained per-property tier. A BY&S briefing for the 585 property includes the address, lease, rent, and §871(d) election — but redacts passport numbers, ITIN digits beyond the last four, and bank account suffixes. None of it requires re-engineering the graph; the change is purely at the render layer.

SKIP · DOES NOT TRANSLATE

Two v2 ideas we're not adopting

Both are right for their original domain. Neither fits estate planning cleanly.

RETENTION DECAY · EBBINGHAUS CURVE

Forgetting doesn't apply to estate facts

v2's retention decay model has facts fade over time unless reinforced. Legal estate facts don't fade — your will is your will until you supersede it; your citizenship status is a hard fact, not a gradual one.

We do want freshness markers (last_verified) on the operational facts that drift — phone numbers, addresses, custody arrangements, current school. Different mechanic, same need: surface what's stale; refuse to use it for high-stakes decisions without refresh.

MESH SYNC · MULTI-AGENT PARALLEL EDITS

Multi-agent sync is overkill

v2 handles concurrent writes from multiple AI agents working in parallel on the same wiki. For a single-family estate planner, this never happens — the principal is the only writer, the LLM is the only assistant.

It would become relevant if we open the model to multiple human advisors editing concurrently. But that contradicts the principal-led posture we set out with: advisors propose, the principal decides, the software brokers. No parallel editing needed.

PRIORITY · SEQUENCE

Three high-leverage, four nice-to-have

Not all upgrades are equal. Some are essential for the estate-planning case; others only matter at scale.

High-leverage — build first

essential for estate planning specifically

01

Confidence scoring Legal & tax decisions need to know which facts are load-bearing. Without it, the rule engine can't distinguish a known passport number from a guess.

02

Supersession edges Wills, trust deeds, POAs revise. Without explicit supersession, history is lost and auditors can't reconstruct what the principal actually decided when.

05

Crystallization Every simulation, reconciliation, and decision already produces structured output. Auto-filing them as Analysis nodes is nearly free and compounds knowledge over time.

Nice-to-have — add at scale

valuable but trigger only past certain thresholds

03

Consolidation tier naming Useful framing for documentation and onboarding new advisors. Not a code change — we already implement the tiers; v2 just gives them names.

04

Hybrid search Matters past ~200 entities. Today's plan has ~25. Add embeddings + BM25 when single-index queries get slow.

06

Broader event hooks Session-start context loading is a quick win for advisor onboarding. Scheduled freshness checks help with operational fact maintenance.

07

PII filter per audience Adds polish to the existing scope-fence model. Important when professional briefings get auto-generated and shared; not essential while the principal manually reviews each.

ROADMAP · IMPLEMENTATION SPECTRUM

Where we are. Where we're going.

v2's six-level ladder, with our progress marked.

L1

Minimal viable wiki

raw + wiki + index + schema (CLAUDE.md / principles.md)

done

L2

Add lifecycle

confidence scoring + supersession + freshness markers

in design

L3

Add structure

typed entities + typed edges + graph traversal queries

in design

L4

Add automation

rule engine + write-time hooks + crystallization + session hooks

Add scale

hybrid search + tier naming + PII filter + scheduled background work

later

L6

Add collaboration

mesh sync + shared/private scoping + work coordination

skip

Seven upgrades, applied.