TutorialsApril 17, 20267 min read

Claude Opus 4.7: How to Migrate Your Workflow to Adaptive Thinking

Claude Opus 4.7: How to Migrate Your Workflow to Adaptive Thinking

Claude Opus 4.7 replaces fixed thinking budgets with adaptive thinking and five effort levels: low, medium, high, xhigh, and max. To migrate, swap your thinking parameter to type "adaptive," remove temperature/top_p/top_k from your payloads, and pick an effort level. Xhigh is now the default for most coding work in Claude Code.

Opus 4.7 shipped April 16 and I broke my integrations within the first hour. The old budget_tokens syntax is dead on 4.7 -- it throws an error, not a warning. This guide covers what actually changed, why the new model makes those gains, and how to rebuild your workflows to capture the real performance improvements instead of just patching broken calls.

What changed between thinking budgets and adaptive thinking?

Thinking budgets gave you a hard ceiling on how many tokens Claude could spend reasoning before responding. Adaptive thinking inverts this: Claude evaluates request complexity first and decides dynamically how much reasoning a problem actually needs. You control the ceiling through an effort parameter with five levels. On Opus 4.7, the old budget_tokens parameter is not accepted at all.

The old approach had a real problem: you were guessing. Set budget_tokens too low and Claude would rush through hard problems. Set it too high and you'd burn tokens on simple tasks that didn't need deep reasoning. Adaptive thinking fixes this by letting the model evaluate complexity per-request. On bimodal workloads -- where some requests are trivially easy and others are genuinely hard -- the token savings are dramatic. Box's Head of AI reported that after migrating from Opus 4.6 to 4.7, their pipeline used 30% fewer AI Units consumed, 56% fewer model calls, and 50% fewer tool calls, while running 24% faster. Those are not marginal gains.

Boris Cherny, who created Claude Code, noted after the Opus 4.7 launch that the model is roughly 2-3x more capable than 4.6 on complex work. That also means it can be wrong in more ambitious ways. Adaptive thinking is what makes scaling sustainable -- Claude allocates reasoning proportionally instead of uniformly, so you're not paying for ceiling reasoning on every single request.

What are the five effort levels and when should you use each one?

The five effort levels -- low, medium, high, xhigh, and max -- give you a spectrum from speed-optimized to capability-maxed. Low is for classification, routing, or high-volume lookups where latency matters and reasoning depth doesn't. Medium gives solid performance without full token expenditure. High is the default for complex reasoning and most coding work. Xhigh is new in 4.7 and sits between high and max, designed specifically for long agentic loops with repeated tool calls. Max is the ceiling -- deepest reasoning, no latency or cost constraint.

The breakdown I've settled on after running 4.7 in production: use medium for tasks you'd call quick (PR summaries, commit messages, short analyses). Use high for most feature implementation work. Use xhigh for multi-step agentic runs where Claude is doing extended tool call sequences and needs to hold coherent context across a long loop. Max for genuinely hard architecture or research problems where cost isn't the constraint. One practical note: Opus 4.7 respects effort levels strictly, especially at the low end. At low and medium effort, the model scopes its work literally to what was asked. If you see shallow reasoning on a complex problem, raise the effort level -- don't try to prompt around it.

In Claude Code specifically, xhigh is now the default for all plans. Anthropic determined that xhigh is the best balance for most coding tasks -- enough reasoning depth to handle complex implementation without the full cost of max. You can override this in your CLAUDE.md settings or via the model configuration command.

Want the templates from this tutorial?

I share every workflow, prompt, and template inside the free AI Creator Hub on Skool. 500+ builders sharing what actually works.

Join Free on Skool

How do you migrate your API calls from Opus 4.6 to 4.7?

The migration requires three changes: replace the thinking parameter syntax, add the effort parameter, and strip temperature/top_p/top_k from your payloads. On Opus 4.7, these sampling parameters are no longer accepted. The new thinking type is "adaptive" instead of "enabled," and budget_tokens is gone entirely.

Here is what a request looked like under Opus 4.6 with extended thinking:

{
  "model": "claude-opus-4-6-20250514",
  "max_tokens": 16000,
  "thinking": {
    "type": "enabled",
    "budget_tokens": 10000
  },
  "temperature": 1,
  "messages": [{"role": "user", "content": "..."}]
}

Here is the equivalent request under Opus 4.7:

{
  "model": "claude-opus-4-7-20260416",
  "max_tokens": 16000,
  "thinking": {
    "type": "adaptive"
  },
  "effort": "xhigh",
  "messages": [{"role": "user", "content": "..."}]
}

That covers the core migration. No temperature, no budget_tokens, new effort field. If your code checks thinking.type === "enabled" anywhere (for detecting reasoning mode), update those checks to "adaptive".

One new concept in 4.7 worth knowing about: task budgets. These are separate from effort levels. A task budget gives Claude a rough target for how many tokens to use across an entire agentic loop, including thinking, tool calls, tool results, and final output. You set it as a rough estimate, not a hard cap. If your agents consistently over-run or abandon tasks too early, task budgets are the right lever. They're optional -- most teams won't need them unless they're running very long or very expensive agentic loops.

What do the benchmarks actually show?

Opus 4.7 scored 87.6% on SWE-bench Verified, up from 80.8% on Opus 4.6. On SWE-bench Pro, which tests real open source software issues rather than curated samples, it hit 64.3% -- ahead of GPT-5.4 at 57.7% and Gemini 3.1 Pro at 54.2%. Anthropic's internal 93-task coding benchmark showed a 13% improvement, with 4 tasks solved that neither Opus 4.6 nor Sonnet 4.6 could crack at all.

The multi-step story is stronger than the single-task numbers. Opus 4.7 delivers a 14% improvement on complex multi-step workflows while using fewer tokens and producing a third of the tool errors compared to 4.6. That error reduction is the part I care about in production -- fewer tool errors means fewer recovery loops, which compounds into real latency and cost savings at scale.

Opus 4.7 is also the first Claude model to pass what Anthropic calls implicit-need tests -- tasks where the model must infer what tools or actions are needed without being told explicitly. In practice this means less scaffolding in your system prompts. I've started removing explicit tool routing instructions from several Claude Code configurations and the model handles routing on its own at xhigh effort.

What else shipped in Claude Code with Opus 4.7?

Three new Claude Code features shipped alongside Opus 4.7: recaps, focus mode, and the /ultrareview command. Recaps are short summaries Claude generates of what it did and what comes next -- useful in long sessions where you need to re-orient without reading through hundreds of tool calls. Focus mode hides intermediate work and shows only final output. The /ultrareview command runs a deep code review pass covering security, logic, performance, and style in one shot.

Bcherny's post on the launch (which hit 135K impressions within hours) made one point that stuck with me: at 2-3x the capability, verification matters more than ever, not less. When Claude is just running simple queries, errors are small. When it's running complex multi-step builds, errors compound. The recaps feature is partly a hedge against this -- it gives Claude a structured self-check point between major phases of long tasks.

Auto mode is also now available as a safer alternative to full permission grants for long-running tasks. Instead of authorizing all tool use upfront, auto mode lets Claude request permissions as it encounters them. For agentic pipelines where you're not sure exactly what tools will be needed, auto mode reduces the blast radius of unexpected behavior.

What is the tokenizer change and why does it matter for your costs?

Opus 4.7 ships with an updated tokenizer that encodes the same input in 1 to 1.35 times more tokens depending on text type. For cost-sensitive pipelines, this is the migration gotcha that will surprise you if you don't benchmark first. A prompt that cost a certain amount under Opus 4.6 could cost up to 35% more under 4.7 from tokenization alone, before capability improvements factor in.

This also affects context window math. Opus 4.7 has a 200K context window. If you're running prompts near that ceiling, code-heavy inputs (which tokenize most differently under the new tokenizer) will consume more of that window than equivalent Opus 4.6 prompts. My recommendation: benchmark your five most expensive prompt templates against the new tokenizer before moving production traffic. Anthropic provides a tokenizer count endpoint -- use it before committing to migration at scale.

At $5 per million input tokens and $25 per million output tokens -- pricing is unchanged from 4.6 -- the capability gain justifies the tokenizer overhead for most production use cases. Box's real-world data showing 30% fewer AI Units is a strong signal that the efficiency gains outpace the tokenizer cost for complex workloads. But verify it for your specific prompts, not just the benchmark results.

FAQ

Can I still use budget_tokens on Claude Opus 4.7?

No. On Opus 4.7, the thinking: {type: "enabled", budget_tokens: N} parameter is not accepted and will throw an API error, not a warning. Budget_tokens remains supported on Opus 4.6 and Sonnet 4.6 but is deprecated and will be removed in a future release. On Opus 4.7, use thinking: {type: "adaptive"} combined with the effort parameter to control reasoning depth.

What effort level should I start with when migrating?

Start with high -- it's the default and maps closest to the behavior of a well-tuned budget_tokens setting on Opus 4.6. If response latency is too high for your use case, drop to medium. If you're running long agentic loops with repeated tool calls, move up to xhigh, which is now the default in Claude Code. Only use max when you need the absolute capability ceiling with no cost constraint.

Does adaptive thinking use more tokens than a fixed thinking budget?

For mixed workloads it typically uses fewer tokens, because adaptive thinking skips extended reasoning on simple requests rather than always allocating the full ceiling. Box's production data showed 30% fewer AI Units consumed after migrating from Opus 4.6 to 4.7, with 56% fewer model calls and 50% fewer tool calls. For uniform high-difficulty tasks, token usage is comparable to a matched fixed budget.

Should I migrate to Opus 4.7 now or wait?

If you're running production agentic workflows that involve multi-step reasoning or repeated tool calls, yes -- the efficiency gains are real and the pricing is unchanged. If you're running simple classification or Q&A pipelines, Sonnet 4.6 or Haiku 4.5 are still better cost choices. Migrate a non-critical workload first, benchmark the tokenizer impact on your prompts, then move production traffic once you've validated the cost profile.

Want the templates from this tutorial?

I share every workflow, prompt, and template inside the free AI Creator Hub on Skool. 500+ builders sharing what actually works.

Join Free on Skool