self-critique-harmful

Status: IN

LLM revision based on self-critique makes answers worse: Sonnet -11pp, Flash -21pp, Pro -56.5pp. Self-critique fails because the same model that made the error evaluates the error

Source: repo:beliefs-pi/entries/2026/05/06/generate-and-critique-llms-are-half-a-mind.md

Depended on by

generate-and-critique

JSON