Status: IN
LLM revision based on self-critique makes answers worse: Sonnet -11pp, Flash -21pp, Pro -56.5pp. Self-critique fails because the same model that made the error evaluates the error
Source: repo:beliefs-pi/entries/2026/05/06/generate-and-critique-llms-are-half-a-mind.md