{"results":[{"id":"generate-and-critique","text":"LLMs are extraordinary generators but unreliable critics. The belief registry externalizes and persists the critic's judgments, replacing internal self-assessment with external structured tracking","truth_value":"IN","justification_count":1,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"hybrid-tms","text":"ftl-reasons is a hybrid TMS: symbolic TMS handles structure (justifications, propagation, cascades, backtracking, challenge/defend) while LLMs handle semantic operations (derive generates beliefs, review-beliefs critiques them, contradiction detection finds nogoods)","truth_value":"IN","justification_count":1,"dependent_count":4,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"model-stacking","text":"Multi-pass agent pattern: Model A generates candidates → TMS records with provenance → Review critiques (machine + human) → Model B receives validated beliefs → Model B derives new beliefs → Review critiques derivations → Repeat. Each level is a full model pass with fresh context and critique pipeline as quality gate","truth_value":"IN","justification_count":1,"dependent_count":0,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""},{"id":"self-critique-harmful","text":"LLM revision based on self-critique makes answers worse: Sonnet -11pp, Flash -21pp, Pro -56.5pp. Self-critique fails because the same model that made the error evaluates the error","truth_value":"IN","justification_count":0,"dependent_count":1,"challenges":[],"last_reviewed":null,"review_result":null,"source_type":""}],"count":4,"limit":20,"offset":0}