Agentic AI

Anthropic unveils Claude 4 models with coding and agentic AI upgrades

23 May 2025
3 minutes
OpenAI rival Anthropic has unveiled Claude 4, its next-gen foundation model featuring improved code capabilities, advanced reasoning, and tools for users to build their own AI agents.
Anthropic's Claude app displayed on a smartphone
Anthropic's Claude app displayed on a smartphone

Anthropic unveiled two new models in its Claude family: Claude Opus 4 and Sonnet 4. Opus is the largest, with the startup claiming it’s “the world’s best coding model”, while Sonnet 4 is designed to be a significant upgrade to the prior Sonnet 3.7.

The latest releases generate responses far faster despite extended reasoning. They’re also compatible with new offerings in Anthropic’s API that let developers build more powerful AI agents.

“These models are a large step toward the virtual collaborator—maintaining full context, sustaining focus on longer projects, and driving transformational impact,” Anthropic’s announcement page reads. “We’re excited to see what you’ll create.”

The Amazon and Google-backed startup looks to be the latest firm to place its bets on agentic AI, or an agent-based AI system capable of performing tasks on behalf of a user.

The new models are designed to think more extensively when using tools, with Anthropic suggesting they’re 65% less likely to use shortcuts or loopholes to complete tasks compared to the previous gen Sonnet 3.7.

Claude 4 systems also come with boosted memories to help with agentic tasks, with Opus 4 able to maintain what Anthropic described as ‘memory files’ of local files Claude has access to, which the startup said helps it with long-term task awareness and performance.

To show off its agentic capabilities, Anthropic showcased Claude Opus 4 autonomously playing Game Boy Colour era Pokémon games, like Gold and Silver.

Anthropic's Claude 4 improved memory in action: When given access to local files, the model records key information to help improve its game play | Credit: Anthropic
Anthropic’s Claude 4 improved memory in action: When given access to local files, the model records key information to help improve its game play | Credit: Anthropic

To further boost its ability to handle tasks as an AI agent, the new Claude 4 models feature improved code handling capabilities.

The models, which dropped days after rival OpenAI unveiled a cloud-based software engineering agent dubbed Codex, can handle background tasks via GitHub Actions and support integrations with platforms like VS Code and JetBrains.

Its coding abilities were touted by GitHub, with the repository platform planning on using Claude Sonnet 4 to power a new coding agent in GitHub Copilot, its AI-powered coding support tool.

Claude Opus 4 took the top spot on release for the SWE-bench and Terminal-bench benchmark tests, which are used to evaluate an AI model’s coding performance

Graph showing Claude 4's performance on the SWE-benchmark for AI coding tasks

Anthropic said the model “delivers sustained performance on long-running tasks that require focused effort and thousands of steps, with the ability to work continuously for several hours”.

“These models advance our customers’ AI strategies across the board: Opus 4 pushes boundaries in coding, research, writing, and scientific discovery, while Sonnet 4 brings frontier performance to everyday use cases as an instant upgrade from Sonnet 3.7,” the startup said.

RELATED STORIES