{"id":"confidence-unreliable","text":"LLM self-assessed confidence does not track accuracy. Confirmed across 4 models: Sonnet r=0.198, Opus r=-0.182 (worse than random), Flash r=0.219, Pro r=0.121. Answer and confidence come from the same process — same structural flaw as human overconfidence (Kahneman)","truth_value":"IN","source":"repo:beliefs-pi/CLAUDE.md","source_url":"","source_hash":"","justifications":[],"dependents":["eem-replaces-confidence","generate-and-critique"],"metadata":{},"explanation":{"steps":[{"node":"confidence-unreliable","truth_value":"IN","reason":"premise"}]}}