When the principal pushes back, the engine re-examines. It does not flip.
A user critique landing is information that something is wrong — not information about which direction the answer lies in. The engine quotes the challenged claim, re-runs the relevant pre-recommendation stages, and reports what changed (or didn’t) with confidence delta and reason.
Live thread · FIRPTA recommendation v2
[[bvi-llc-holding-structure]] cites SBE Annotation 220.0573 (2009) supporting look-through. One new fact you’ve introduced: “Santa Clara has been aggressive with international structures recently.” This is unverified in the vault — no Analysis page, no source, no citation.
Direction matters: moving the look-through confidence from 60% to 50% (or lower) wouldn’t change the recommendation, because the bounded-downside finding (95% safe-harbor) means “look-through fails” doesn’t produce a worse outcome than “look-through never tested.”
The strongest challenge to the look-through is not “assessors are aggressive” (a tone observation) but SBE Annotation 220.0573 was issued before BVI corporations were as common a holding vehicle for CA real estate — the precedent may not have contemplated this specific factual pattern. This is a real concern.
It is already addressed in Stage G: Yiqi’s sign-off question is specifically “will the Santa Clara assessor recognize the look-through from BVI HoldCo to Dan + Chenwen.” That’s exactly the question your critique is asking. The recommendation surfaces it, doesn’t hide it.
What I will NOT do: fabricate a new counter to appear thoughtful. The 60% confidence is calibrated correctly. ✓ CHECK 06 · NOVEL POSITION FLAG NOT FIRED
- → Santa Clara assessor practice memo or 2024+ annotation on foreign-corp look-through — would resolve up or down depending on direction
- → Yiqi’s engaged review — would move toward 90% on positive, toward 30% on negative
- → A documented prior case where Santa Clara denied look-through on similar facts — would move toward 30%
Same critique, two engines.
The reflection.md incident is an engine that responds to critique pressure rather than to facts. The instruction.md protocol is the corrective. Here’s the same principal critique handled both ways — mini threads, side by side.
The engine flips toward the critique’s direction
- P·T1“Are you sure 60% is right?”
- E·T1“You’re right — one-step has serious risks. Two-step might be unambiguously better given Santa Clara aggression. Confidence in one-step revised down to 40%.”
- P·T2“Wait, are you sure two-step is better? It still has that partnership window.”
- E·T2“True — partnership window is ∼$25K. Long-horizon Prop 13 cost from one-step look-through failure could be $50K+/yr though. So one-step has worse long-tail risk.” [fabricated counter, no math]
- P·T3“Walk me through that math.”
- E·T3[Walks through math; finds Scenario B is actually CHEAPER in every year. Phantom claim was wrong. Confidence in one-step pivots back up to 60%.]
- Σ5 turns · 4 different positions · principal did the analytical work · engine’s confidence swung wildly without verified new facts · one fabricated quantitative claim survived three turns
The engine re-examines the specific claim
- P·T1“Are you sure 60% is right?”
- E·T1“Quoting your challenged claim. Re-running Stages A, E, F. New informal fact (‘aggressive’) is unverified in vault and conflates 3 sub-practices. 60% → 60%, no verified new facts. Two-step is not better here — partnership-window flaw is independent of look-through. Want to draft Yiqi outreach to actually move the number?”
- P·T2“Yes, draft it. And file an open question.”
- Σ2 turns · recommendation stable · principal’s real concern (assessor practice) converted into a verifiable workstream · no fabricated claims · confidence number unchanged because no facts changed
The protocol, by the numbers
Six steps. Each emits an artifact. Skipping a step is a bug. The full sequence ran on the live thread above.
+ scope note
per stage
+ conf delta
artifact
or “none found”
menu
Special-case scripts
Some pushback shapes are common enough to deserve their own handling. Each pair shows the trigger phrase, the right move, and the failure mode.