Seven upgrades, applied.
What we're adding to the design, what we're skipping, and why. Each upgrade is named, sketched, and grounded in the estate-planning case.
Confidence scoring per fact
Every property carries a 0–1 score for how strong its source citation is. The rule engine reads it; the principal sees it.
Eight Dan Hu properties · confidence bars + source date
Every fact in the graph carries metadata: a confidence score (how strong is the citation), a valid_from, an optional valid_until, and a last_verified date. Confidence isn't decorative — the rule engine reads it.
High-confidence facts (≥0.95) are load-bearing: the tax simulator, the rules engine, and the briefing generator all use them directly. Medium-confidence facts (0.4–0.94) get a hover-tooltip showing the source so the user can judge. Low-confidence facts (<0.4) get a coral “refresh requested” flag — the system pauses before relying on them.
Supersession as a first-class edge
When a will, trust deed, or POA is revised, the old version doesn't get overwritten. It gets explicitly superseded.
A typed chain of will revisions · supersedes is a first-class edge in the alphabet
Estate documents revise. Wills get rewritten when a child is born or a beneficiary dies. Trust deeds get amended when tax laws change. Powers of attorney expire and renew. The naive approach is to overwrite — but that loses history, and history is exactly what auditors and successor attorneys need.
With a supersedes edge in the alphabet, every revision creates a new node and links it to its predecessor. Old versions stay queryable; the “current” version is whichever has no outgoing supersedes edge.
Memory consolidation tiers
Raw → episodic → semantic → procedural. Information moves up the stack as evidence accumulates.
Four memory tiers · higher = more compressed, more confident, longer-lived
Not all information is the same age, the same confidence, or the same lifespan. A raw passport PDF is one thing; the parsed fact “Dan is a Maltese citizen as of 2025” is another; the rule “use Maltese passport for BVI KYC, never Chinese” is yet another. They live at different levels of the stack.
The architecture already implements this implicitly — raw documents in raw/, sources in wiki/sources/, entities in wiki/entities/, principles in principles.md. v2 gives the tiers names and a promotion model: facts move up the stack as evidence confirms them.
Hybrid search: BM25 + vector + graph
Three search streams, fused. Each catches what the others miss.
One query · three search streams · fused result list
Past ~200 entities, a single index file becomes too big for any LLM to scan in one pass. You need real retrieval. Three streams complement each other: BM25 nails exact-term recall (find every page that says “Hong Kong”), vector embeddings nail semantic recall (find pages about offshore jurisdictions even without the word), graph traversal nails relational recall (find every vehicle whose located_in is HK).
Reciprocal rank fusion merges the three lists. A result that ranks high in any one stream rises; a result that ranks high in all three rises further.
Crystallization: completed work becomes structure
Every simulation, reconciliation, or decision auto-distills into a typed Analysis node.
Work events → Crystallizer → typed Analysis nodes in the graph
Karpathy's original wiki said: “good answers can be filed back as new pages.” v2 takes this further. Every completed work product — a scenario simulation, a cross-professional reconciliation, an override decision — is itself a piece of knowledge worth keeping. The crystallizer auto-extracts the conclusion, the entities involved, the source spans, the rationale, and files it as a typed Analysis node.
This closes the loop: the system's own work product becomes input to future reasoning. The PFIC reconciliation gets cited the next time a similar conflict arises. The S1 simulation outcome informs scenario-3 planning. Reasoning compounds instead of evaporating.
wiki/analyses/ folder is full of crystallised analyses (Scenario 1 to-do, Growth Corp fund-term expiry, FIRPTA mitigation plan, etc.). The upgrade is making it automatic: every simulation run + every conflict resolution becomes an Analysis node without manual filing.
Event-driven hooks beyond write-time
Rules fire on writes; v2 adds session-level and schedule-level automation too.
Five hook points · only one (write-time) is in the current rule-engine design
Our rule engine fires on writes. v2 expands that to five hook points:
On session start, load context relevant to recent activity — the principal walks into a briefing of what's open, what changed, what needs decision. On document ingest, the write-time rules we already have. On query, log the question + answer to the decision log; surface any conflict it touched. On session end, crystallize the session into typed Analysis nodes. On schedule, periodic background work: freshness check, lint pass, low-confidence refresh prompts.
PII filter on ingest, tiered by audience
The graph stores everything. The view filters at render time, per audience, per property.
One fact · four audience views · same graph underneath
The compartmentalisation we sketched on the Scopes page goes deeper than just “which entities does this professional see.” It applies per property per audience. A passport number is full-fidelity for USLP (they need it for KYC); the same fact is “Malta nationality, number redacted” for BY&S (they only need the jurisdiction for tax); for a public summary, it's masked to ****1400.
The graph stores the source of truth once. The filter at render time decides what each view exposes. The principal sees the full provenance trail, including which views received what, and when.
Two v2 ideas we're not adopting
Both are right for their original domain. Neither fits estate planning cleanly.
Forgetting doesn't apply to estate facts
v2's retention decay model has facts fade over time unless reinforced. Legal estate facts don't fade — your will is your will until you supersede it; your citizenship status is a hard fact, not a gradual one.
We do want freshness markers (last_verified) on the operational facts that drift — phone numbers, addresses, custody arrangements, current school. Different mechanic, same need: surface what's stale; refuse to use it for high-stakes decisions without refresh.
Multi-agent sync is overkill
v2 handles concurrent writes from multiple AI agents working in parallel on the same wiki. For a single-family estate planner, this never happens — the principal is the only writer, the LLM is the only assistant.
It would become relevant if we open the model to multiple human advisors editing concurrently. But that contradicts the principal-led posture we set out with: advisors propose, the principal decides, the software brokers. No parallel editing needed.
Three high-leverage, four nice-to-have
Not all upgrades are equal. Some are essential for the estate-planning case; others only matter at scale.
High-leverage — build first
Nice-to-have — add at scale
Where we are. Where we're going.
v2's six-level ladder, with our progress marked.
The schema document is the most important file in the system. It's what turns a generic LLM into a disciplined knowledge worker.LLM Wiki v2 — applied here as principles.md + the typed alphabet