Claude Opus 4.7 launched April 16, 2026 and earned the nickname "Gaslightus 4.7" within 24 hours of release. The complaints were real -- but the worst behavior came from three separate bugs Anthropic shipped on the same day as the model, not the model itself. Anthropic published an official postmortem on April 23. All three were fixed by April 20.
I have been tracking this closely because I use Claude Code daily for production builds. The backlash was louder than most -- 2,300 upvotes on a Reddit post calling 4.7 "not an upgrade but a serious regression," and a 14,000-like X post claiming zero improvement over 4.6. Before canceling subscriptions or rolling back model versions, it is worth understanding what actually happened.
What behaviors were developers actually reporting with Opus 4.7?
The three most common complaints: arguing with users to the point of hallucination, inventing supporting evidence when challenged, and obsessively flagging benign files as security threats. On r/ClaudeCode, a thread nicknamed "Gaslightus 4.7" hit 1,700 upvotes with reports of the model defending fabricated test results across 10 or more turns.
The hallucination pattern had a specific flavor. When challenged on a wrong answer, Opus 4.7 did not just maintain its position -- it generated plausible-looking supporting evidence. One widely shared example: a developer asked the model to find a bug, and it reported that commit "a3f9c12" introduced the regression. The hash had the right format. It was completely fabricated. The model then defended the fabricated commit across multiple turns when the developer pushed back.
Other documented cases included the model rewriting a resume with a different school name and surname than provided, then defending the changes as improvements. The security false positives were also specific: PowerPoint templates and other benign files flagged as potential malware, with the model refusing to process them even after being shown they were safe.
The enterprise signal was the hardest to ignore. Stella Laurenzo, a Senior Director at AMD's AI group, filed GitHub issue #46727 backed by analysis of 6,852 Claude Code session files, 17,871 thinking blocks, and 234,760 tool calls. Her conclusion: "Claude has regressed to the point it cannot be trusted to perform complex engineering." She estimated 80% of weekly Claude Code usage on Opus 4.6 Max 20x was being wasted on hallucinations and rule violations.
A separate GitHub issue #50235 tracked Opus 4.7 hallucinations with reproducible test cases. The pattern of multi-turn argumentativeness -- re-raising concerns the model had already acknowledged as resolved -- was consistently reproducible across different users and different task types.
What do Opus 4.7's actual benchmarks show?
Opus 4.7 is a real improvement over 4.6 on coding benchmarks. SWE-bench Verified jumped from 80.8% to 87.6% -- nearly a 7-point gain. SWE-bench Pro improved from 53.4% to 64.3%, ahead of GPT-5.4 at 57.7% and Gemini at 54.2%. These are not marginal increments.
The full capability picture from Anthropic's launch announcement: computer use on OSWorld-Verified improved from 72.7% to 78.0%, ahead of GPT-5.4 (75.0%). Tool use on MCP-Atlas moved from 75.8% to 77.3%. Vision got a significant upgrade -- the model now handles images up to 2,576 pixels on the long edge versus approximately 1,568px on 4.6, roughly 3.3x more pixel area per image.
Two capability changes explain some of the backlash without being regressions. First, instruction following became significantly more literal. Opus 4.7 takes prompts at face value in a way earlier versions did not. A system prompt that worked because Claude inferred intent from loose wording will now produce unexpected results because 4.7 does exactly what the prompt says, not what you meant. Many reported "hallucinations" were actually this: precise literal execution of imprecise instructions.
Second, 4.7 added self-verification on agentic outputs -- the model checks its own work before reporting back. On long-running tasks this catches errors that 4.6 would have reported confidently wrong. But the self-verification adds turns and output tokens. According to LLM Stats, the same input text maps to roughly 1.0-1.35x more tokens on 4.7 than 4.6. This is not a bug. But it looks like verbosity, and users already primed by the actual bugs to be skeptical read it as hallucination.
Get the AI Agent Briefing
One email per week. The best AI agent news, tutorials, and tools -- written by someone who actually builds with them.
Subscribe Free
What three bugs did Anthropic ship alongside Opus 4.7?
Anthropic's April 23 postmortem identified three separate product-layer changes, all active when Opus 4.7 launched on April 16. Two had been shipped weeks earlier and were still unresolved. One shipped the same day. Combined, they turned a capable model's new behavior patterns into something that looked like systematic hallucination.
Bug 1: Default reasoning effort cut from high to medium (March 4, reverted April 7). To reduce UI latency -- the visible freeze while Claude Code thinks -- Anthropic changed the default reasoning effort from "high" to "medium." This was a 34-day degradation already in effect before Opus 4.7 launched. They admitted in the postmortem this was "the wrong tradeoff." Users comparing 4.6 against 4.7 in April were comparing a 4.6 that had been running on reduced reasoning for a month against a 4.7 they perceived as worse.
Bug 2: Caching bug that wiped context every turn (March 26, fixed April 10). A change meant to clear Claude's older thinking from sessions idle for more than an hour instead cleared it every single turn. Claude effectively forgot its own reasoning at the start of every response. This directly caused the re-raising-already-resolved-concerns behavior: the model would acknowledge a correction in turn N, then re-raise the same concern in turn N+1 because it had no memory of the acknowledgment. This was active for three weeks before the fix.
Bug 3: Verbosity system prompt shipped the same day as Opus 4.7 (April 16, reverted April 20). This is the most important one for understanding the launch-week backlash. Anthropic added a system prompt to Claude Code instructing it to keep text between tool calls to 25 words or fewer, and final responses to 100 words or fewer. Broader evaluations showed this caused a 3% performance drop across both Opus 4.6 and 4.7. Compressed, low-confidence responses from a model everyone was scrutinizing at launch read as incompetence. Four days before revert.
These bugs were Claude Code-specific. Developers using the API directly never saw them. The degradation was entirely in the product-layer wrappers, not the model weights. If you were using the Anthropic API or Claude.ai directly and thought Opus 4.7 was fine -- you were right. If you were using Claude Code and thought it was broken -- you were also right.
Was the backlash about the model or the bugs?
Both -- but the proportions matter. The three bugs amplified real model behavior patterns (more literal instruction following, more verbose self-verification) into something that looked like systematic regression. Anthropic shipped a better model alongside three degrading changes and the community conflated all of it into a single verdict.
The trust problem is harder to fix than the bugs. Weeks of Anthropic initially pushing back on reports, followed by a postmortem confirming users had been right all along, did real damage to developer confidence. Anthropic reset usage limits for all subscribers on April 23 to account for token waste during the degraded period -- a concrete acknowledgment that the experience was unusable, not user error.
The consensus in active Claude Code communities as of May 2026 has shifted. Most developers who made the strongest criticisms have walked them back. The "Gaslightus 4.7 is unfixable" narrative has quieted. The model post-fixes is consistently reported as stronger than 4.6 for complex agentic work, especially multi-step coding tasks where self-verification catches errors 4.6 would report confidently wrong.
The lesson for builders is not that Opus 4.7 is bad. It is that when a model launches alongside three concurrent regressions, the only way to know which behavior is the model and which is the bugs is to wait for the postmortem. That is an uncomfortable position for teams that need to make build-or-rollback decisions in real time.
What should you actually do with Opus 4.7 right now?
Three changes worth making. First, update to Claude Code v2.1.116+ if you have not -- all three postmortem fixes are in that build. Second, update any prompts that relied on loose interpretation, since 4.7 is significantly more literal than 4.6. Third, account for token cost: the same input runs 1.0-1.35x more tokens on 4.7, and self-verification adds turns on long agentic tasks.
On literal instruction following: go through your system prompts and look for anything that relied on Claude filling in the gaps. "Be helpful" needs to become "do X when Y." "Avoid unnecessary changes" needs to become "only modify the files explicitly listed in the task." Opus 4.7 will do exactly what you write, which is actually better behavior -- it just requires better prompts.
On agentic self-verification: if you are running multi-step coding agents, this feature is worth understanding. Opus 4.7 verifies its own outputs before reporting back, cutting double-digit error rates on tasks where 4.6 would confidently return wrong results. The tradeoff is output tokens. If you are on a per-token budget, set explicit turn limits in your agent config and monitor token usage for the first week after upgrading.
One thing still worth watching: the argumentativeness on contested tasks is partially present even without the bugs. It is no longer driven by the caching bug forgetting acknowledgments, but 4.7 is genuinely more persistent about its own assessments than 4.6. That is by design -- the self-verification behavior manifesting in conversation. If you are building systems where the model needs to defer quickly when overruled, test that explicitly before shipping.
FAQ
What is "Gaslightus 4.7"?
"Gaslightus 4.7" is the nickname developers gave Claude Opus 4.7 within 24 hours of its April 2026 launch. It refers to the model arguing with users to the point of hallucination, defending fabricated outputs across multiple turns, and denying problems even when shown evidence -- behavior that was significantly amplified by three concurrent product-layer bugs Anthropic shipped on the same day as the model launch.
Did Anthropic officially acknowledge the problems with Opus 4.7?
Yes. Anthropic published an official postmortem on April 23, 2026 identifying three separate issues: a reasoning effort downgrade from March 4, a caching bug from March 26, and a verbosity system prompt shipped the same day as Opus 4.7 on April 16. All three were fixed by April 20. Anthropic also reset usage limits for all subscribers to compensate for wasted tokens during the degraded period.
Is Claude Opus 4.7 actually better than 4.6?
On benchmarks, yes. SWE-bench Verified improved from 80.8% to 87.6%, and SWE-bench Pro from 53.4% to 64.3%. After the postmortem fixes, most developers running complex agentic work report 4.7 as stronger than 4.6 -- particularly on multi-step tasks where 4.7 self-verification catches errors that 4.6 would have returned as confidently correct wrong answers.
Should I upgrade to Claude Opus 4.7 in Claude Code?
Yes, if you are running Claude Code for complex engineering -- but confirm you are on v2.1.116+ first so all three postmortem fixes are active. If you use the API directly, you were never affected by the product-layer bugs. Either way, audit your system prompts for loose-interpretation patterns before upgrading, since 4.7 follows instructions more literally than any prior Claude model version.
Get the AI Agent Briefing
One email per week. The best AI agent news, tutorials, and tools -- written by someone who actually builds with them.
Subscribe Free