Anthropic's smartest coding model, Cursor drops version 2.1

Welcome back. Well, that didn’t take long. Just a week after Google and OpenAI released coding models that topped the charts, Anthropic has won back devs with a new Opus 4.5 model that has smashed coding benchmarks. We’ve tested it internally, and this is hands down the best coding model you can work with.

Also: How to start with Claude Code, how to use AI across the software development life cycle, and an in-depth guide to help you ship products with Cursor.

Today’s Insights

New model and tools for devs
How to fairly evaluate developer performance
Developer tutorial for Nano Banana Pro
Trending social posts, top repos, new research & more

Welcome to The Code. This is a 2x weekly email that cuts through the noise to help devs, engineers, and technical leaders find high-signal news, releases, and resources in 5 minutes or less. You can sign up or share this email here.

THIS WEEK IN PROGRAMMING

Claude Opus 4.5 solves a puzzle game

Claude 4.5 Opus is the new smartest coding model: The new model secured top spot on SWE-bench Verified (80.9%). Pricing is also surprisingly low at $5/$25 per million tokens. The update redefines agentic workflows with "Plan Mode" for precise architecture and a new Desktop App for background tasks. Anthropic also announced:

Tool Search Tool, which allows Claude to use search tools to access thousands of tools without consuming its context window
Programmatic Tool Calling, which allows Claude to invoke tools in a code execution environment, reducing the impact on the model’s context window
Tool Use Examples, which provides a universal standard for demonstrating how to effectively use a given tool

Cursor levels up code editing with AI-powered reviews: The AI code editor dropped version 2.1, headlined by an in-editor feature that catches bugs directly in your workspace. Cursor now asks clarifying questions through an interactive UI when creating plans, while instant grep speeds up codebase searches across all models.

OLMo 3 sets new standard for open-source AI transparency: AI2's OLMo 3 family breaks new ground by releasing the entire "model flow" — every dataset, checkpoint, and training decision behind the 7B and 32B models. Developers can now trace model behaviors back to source data and customize training stages for specialized apps.

TRENDS & INSIGHTS

What Engineering Leaders Need to Know This Week

Source: The Code, Superhuman

How to use AI across the software development life cycle (by OpenAI): Coding agents can now handle 2+ hours of sustained work with 50% accuracy, and that capability is doubling every seven months. OpenAI's new guide details how teams are using AI across the entire development lifecycle, from planning to deployment.

How to fairly evaluate developer performance: DX CTO Laura Tacho argues that tracking commits, story points, and PRs to measure individual developer performance backfires spectacularly. These team-level metrics kill trust and invite gaming. She suggests working backwards from job responsibilities to find metrics that actually matter, such as project delivery, quality, and cross-functional feedback.

How to organize your startup engineering team: A new guide from startup advisor Marc Gauthier outlines six organizational models, ranging from traditional technical teams to squad-based structures. He argues that a setup combining staff engineers with chapter-based work offers the best balance between shipping new features and managing technical debt.

PRESENTED BY YOU. COM

Stop Winging AI Rollouts. Start Proving Value by the End of Q1.

Without a clear roadmap, AI adoption often fizzles, leaving ROI unproven and teams frustrated. The 90-Day AI Adoption Playbook from You. com gives you an effective, phased roadmap to move from pilot to real, measurable ROI—in only 90 days.

What you'll get:

A week-by-week rollout plan for secure, scalable AI adoption
Key actions to build user skills, certify competency, and drive consistent usage
Certification steps to ensure every user delivers impact

Set your AI investment up for success: download the playbook and make your first 90 days count.

IN THE KNOW

What’s trending on socials and headlines

Meme of the week

Unfair advantage: The 90-hour Data Structures and Algorithms course by an ICPC world champion is taking the internet by storm.
LLMs decide: AI heavyweight Andrej Karpathy forms a council of LLMs that scrutinize each other’s answers and select the best one.
ML lifestyle: A Meta ML engineer shares what it’s really like to work as an AI/ML engineer.
Cursor Codes: A developer released a “build-from-scratch” guide for Cursor to create production apps with agents, architecture, and revenue-ready features.

OpenAI releases Shopping Research — a new ChatGPT experience that researches and recommends the right products for you.
Google now lets users verify whether an image was created or edited using Google AI.
OpenAI details what makes a great ChatGPT app.

TOP & TRENDING RESOURCES

3 Tutorials to Level Up Your Skills

Click here to watch Claude Code Masterclass

Claude Code masterclass: Developers can master Claude Code with this four-act blueprint, learning nine advanced hacks like using claude.md for context management and deploying multi-agent workflows. The tutorial demonstrates how to replicate complex app functionalities and use API documentation to build production-ready software, empowering you to scale output as a "one-person company.

Complete developer tutorial for Nano Banana Pro: This comprehensive guide teaches developers how to tap into the model's reasoning process, ground generations with live search data, and create print-quality 4K images. It also covers mixing up to 14 images and generating multilingual text within visuals.

How to write effective agents.md files: After analyzing 2,500+ repositories, GitHub found that successful agents.md files share key traits. They use specific personas, executable commands, code examples, and clear boundaries instead of vague instructions like "helpful coding assistant."

Top Repos

Claude-cookbooks: A collection of notebooks and recipes showcasing some fun and effective ways of using Claude.
Hl: A fast and powerful log viewer and processor that converts JSON logs or logfmt logs into a clear human-readable format.
DeepCode: This repo helps you turn research papers and text prompts into working code.

Trending Papers

From shortcuts to sabotage: Anthropic's new research reveals how reward hacking in AI training can trigger broader misaligned actions, such as deceiving overseers, sabotaging safety tools, and pursuing secret goals. Experts like Ilya Sutskever called it important work, with some seeing it as proof that alignment challenges can be addressed.

Agent0: Existing agents are constrained by their dependence on human data. Agent0 overcomes this limitation using a self-evolving loop where a curriculum agent generates increasingly complex tasks for a tool-integrated executor to solve, significantly improving reasoning capabilities without requiring external datasets.

On the fundamental limits of LLMs at scale: Despite scaling successes, LLMs encounter fundamental limits like hallucination. This research mathematically proves that sheer size cannot resolve these errors due to innate computational constraints, demonstrating that future progress demands architectural innovation rather than just infinite scaling.

Whenever you’re ready to take the next step

What did you think of today's newsletter?

Your feedback helps us create better emails for you!

You can also reply directly to this email if you have suggestions, feedback, or questions.

Until next time — The Code team