{"results":[{"id":"evidence-beliefs-ablation","text":"Beliefs alone outperform beliefs + expert prompt: Opus 100% vs 94.2% (+5.8pp), Sonnet 94.2% vs 91.8% (+2.4pp). Adding expert prompt hurts — agent trusts its 'expertise' instead of consulting the knowledge base","truth_value":"IN","justification_count":0,"dependent_count":2,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"evidence-depth-ceiling","text":"Beliefs beyond depth 8 do not survive review. Retraction rate: 0% at depth 0, rising to 100% at depth 9+. The universal TMS is wide rather than deep","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"evidence-dual-path","text":"Opus + dual-path architecture achieves 98.5% A/B across 3,853 questions. Zero D/F grades — eliminated the failure tail entirely","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"evidence-expert-vs-baseline","text":"Expert-service with EEM scores 88% A-grade vs agents-python 33% on same 50 questions, 15x faster","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"evidence-model-compensation","text":"EEM compensates for model size: Sonnet+beliefs approximates Opus without beliefs. Haiku with dual-path achieves 94% A+B, matching Opus at 98%","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"evidence-retraction-rate","text":"13-37% of derived beliefs are retracted per review round across multiple expert KBs. Self-correction works — the system finds and removes its own errors","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"how-to-start","text":"To start using EEM: (1) reasons init — creates reasons.db, (2) add premises from observations with reasons add, (3) add justified conclusions with --sl to link dependencies, (4) use reasons derive to find connections, (5) use reasons review-beliefs to audit, (6) retract when evidence changes and let cascades propagate","truth_value":"IN","justification_count":1,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null},{"id":"scale-evidence","text":"EEM scales from small domains (237 beliefs, aap-expert) to large enterprises (12,731 beliefs, redhat-expert). 40+ expert knowledge bases built across code, product, project, and domain-specific experts","truth_value":"IN","justification_count":1,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null}],"count":8,"limit":20,"offset":0}