Anti-confirmation bias

Claude doesn't grade Claude

Code review runs on a different framework than the author. Not by convention. By architectural invariant.

The failure mode

When an LLM Agent reviews its own output, the failure looks exactly like success. The same model that produced the PR is asked whether the PR is good, and the same biases that shaped the code shape the review. There's no contradiction, no missed assumption, no second opinion. Just a confirmatory loop that gets faster the more it's run.

This compounds where it hurts most: at merge. The cost of a bad merge isn't bounded by the cost of a bad review. It's bounded by everything downstream that depends on the merged code being correct. By the time the bias shows up in production, the trail back to the review is cold.

Plenty of anti-slop prompting in the play templates, too. But cross-framework review is what catches what the prompts miss.

The architectural rule

Swink AgentShore enforces two layers of separation, in the executor and not the prompt:

Framework separation. If the author was a Claude-family LLM Agent, the reviewer must come from a different framework (Codex, an open-weight model, a separate vendor). The hard-gate catalog blocks same-framework review for the same PR.
Identity isolation. Reviewer LLM Agents use independent GitHub SSH keys from authors. The git history shows two different signing identities. Automated rubber-stamping is structurally prevented. Not "discouraged."

What this looks like at runtime

Author

claude-sonnet-4.6

Picked up the issue, wrote the implementation, opened the PR. Signs commits with its own SSH key.

Reviewer

codex-medium

Different framework, different identity, different prompt template. Reviews against an explicit checklist, not "does this look right".

The policy can't route around it. The Code Review play has a hard precondition: reviewer.identity != author.identity — and because each agent framework runs under its own GitHub identity, a different identity means a different framework. If no eligible reviewer is available right now, the play is masked. The policy picks something else, or instantiates the right LLM Agent first.

Why this stays even on human override

Most autonomy gates can be overridden by a human. The anti-confirmation rule cannot. The reasoning is unglamorous: humans are worse at catching confirmation bias under time pressure, and "just this once" is the most common shape of an incident postmortem.

Hard gate: Code Review reviewer must not be the PR author.
Hard gate: Cross-framework requirement is not overridable from the CLI.
Hard gate: Merge requires an approved PR and passing CI before it can run.

What it doesn't claim to fix

Cross-framework review catches confirmation bias. It does not catch:

Shared training-data blind spots. If every frontier model misses the same security pattern, swapping vendors won't help. This is why QA still runs on top, as a separate large-tier validation pass.
Reward hacking. The policy can still find ways to optimize the reward without optimizing the underlying goal. That's the scope-validator's job, not the reviewer's.
Bad goals. If the issue is wrong, the PR will be wrong, and the review will be a thoughtful endorsement of the wrong thing.

The principle is narrower than "review fixes everything" and more useful than "reviewers are mostly redundant": it fixes the specific failure mode where the author and reviewer share priors.