{"results":[{"id":"confidence-unreliable","text":"LLM self-assessed confidence does not track accuracy. Confirmed across 4 models: Sonnet r=0.198, Opus r=-0.182 (worse than random), Flash r=0.219, Pro r=0.121. Answer and confidence come from the same process — same structural flaw as human overconfidence (Kahneman)","truth_value":"IN","justification_count":0,"dependent_count":2,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"evidence-retraction-rate","text":"13-37% of derived beliefs are retracted per review round across multiple expert KBs. Self-correction works — the system finds and removes its own errors","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"generate-and-critique","text":"LLMs are extraordinary generators but unreliable critics. The belief registry externalizes and persists the critic's judgments, replacing internal self-assessment with external structured tracking","truth_value":"IN","justification_count":1,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"how-agents-use-eem","text":"LLM agents use EEM by: querying beliefs via search/show/explain before answering, citing node IDs for auditability, running derive to generate new beliefs from existing ones, running review-beliefs to self-audit, recording nogoods when contradictions appear. The agent does not need to be told it is an expert — the knowledge base speaks for itself","truth_value":"IN","justification_count":1,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"self-critique-harmful","text":"LLM revision based on self-critique makes answers worse: Sonnet -11pp, Flash -21pp, Pro -56.5pp. Self-critique fails because the same model that made the error evaluates the error","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"self-improvement","text":"The system finds problems in itself. Each improvement improves the system's ability to find the next improvement — exponential compounding vs linear improvement in static systems","truth_value":"IN","justification_count":1,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null}],"count":6,"limit":20,"offset":0}