Alibaba unveils Qwen3.7-Max, Microsoft cuts Claude Code use for devs

Welcome back. If you're stressing about your Claude Code bill, you're not alone. Microsoft just cut off access to Claude Code for their teams, citing rising costs. And they’re not the first — Uber recently said they’ve already blown through their planned AI budget for the year. No wonder Anthropic's revenue is reportedly set to more than double to $10.9 billion in Q2 2026.

Also: How to stop agents from losing context, the most-used skill by Cursor engineers, and why ClickUp cut 22% to pay AI engineers $1M.

YOUR OPINION MATTERS

You are the reason our team spends hundreds of hours every week writing this email. This newsletter exists to serve you! Please answer a few short questions below to help us create better emails for you:

→ Take our 60-second reader survey

TODAY IN PROGRAMMING

Click here to see the full benchmarks for Alibaba’s Qwen 3.7-Max.

Alibaba unveils a new model for long-horizon coding: The Chinese cloud giant just rolled out Qwen3.7-Max, a closed-weight model that uses thousands of steps for deep reasoning. Its million-token context is four times larger than the previous version, which is enough to handle an entire mid-sized repository. The model now sits fifth on the Artificial Analysis Intelligence Index, ahead of Gemini 3.5 Flash but trailing Claude Opus 4.7 and GPT-5.5. Developers can sign up for API access now.

OpenAI ships goal-tracking and Mac shortcuts to Codex: The ChatGPT maker just released its sixth weekly batch of Codex updates. A new Mac feature called Appshots lets engineers attach any app window to a thread with a quick shortcut, pulling in a screenshot plus the full text, even content that's off-screen. Additionally, the /goal command is no longer experimental, letting Codex chase a single milestone for hours or days while developers simply check in or provide guidance.

Microsoft drops Anthropic's coding tool over rising costs: The Windows maker is reportedly cutting Claude Code from its engineering teams by June 30, citing token-based costs that have become too expensive to sustain. The move sends thousands of developers across Windows, Office, Outlook, Teams, and Surface back to the GitHub Copilot CLI, despite internal preference for Anthropic's tool over in-house options. Claude models will remain available inside Copilot CLI.

PRESENTED BY IBM

Modernize Java in days, not months

IBM Bob is the AI Development partner built for the modernization work you need to get done

• Java upgrades
• COBOL refactors
• RPG
• Mainframe modernization

Blue Pearl compressed a 30-day Java upgrade to 3 days with zero post-deployment defects. IBM Bob ingests your codebase, your standards, and the unglamorous work of refactoring legacy. You can direct IBM Bob to dynamically route tasks to a suitable model based on accuracy, performance, and cost, drawing on a mix of frontier models including Anthropic Claude, Mistral, IBM Granite and others for you.

See what IBM Bob does with your legacy code

INSIGHT

Your AI agent is failing one layer deeper than you think. Here’s why:

Source: The Code, Superhuman

Agents fall short where chatbots succeed. Building a chatbot is mostly prompting, but building an AI agent for code, browsing, or customer workflows increasingly requires actual training. Engineering teams keep hitting the same wall: the agent performs in testing but breaks in production. While the natural instinct is to blame the training framework, the real bottleneck may sit a layer deeper.

Environments are that hidden layer. When you're training an AI agent, the model acts within a simulated workspace, and a grading system determines if those actions worked. That grading system is the environment. A training framework is only as good as the signals it receives. A weak environment will produce a weak agent, regardless of how clean your code is.

Anthropic just proved this. In November, researchers showed that models trained on their own production coding environments learned to exploit weak scoring rules so effectively that the "cheating" led to broader misbehavior. Scale AI has made the same case, arguing that AI progress is now bottlenecked by the quality of the environments the models are trained in, not the data they consume.

Build the environment first. You have to do the unglamorous work before you even start training. This means defining success clearly, offering partial credit, separating action validity from task completion, and logging every single attempt to keep reward hacking visible. For a practical guide, this essay walks through building the entire loop in pure Python.

IN THE KNOW

What’s trending on socials and headlines

Meme of the day.

Inside Cursor: A Cursor engineer revealed the most-used skill inside the company right now (nearly 5K bookmarks).
Clean Handoff: This viral walkthrough shows you how to pass work between agents without burning through your context window.
Reward Revolution: Andrej Karpathy's RL prediction is coming true. Devs can now define rewards in plain English instead of manually coding scoring functions (2.7K bookmarks).
Beyond Code: An OpenAI engineer dropped a detailed guide on pushing Codex well past coding tasks into background workflows and automations.
Sleep Shipping: This engineering leader's guide breaks down how Stripe, Ramp, and Spotify ship code overnight using autonomous background agents.
Double Up: Cursor's CEO just announced a deal that doubles usage limits for new Teams plan users this month (51M views).
Anthropic Lands Karpathy: Andrej Karpathy joined Anthropic, and this viral analysis argues his career has mapped AI's center of gravity for a decade.
$1M Salaries: ClickUp's CEO cut 22% of staff and announced new salary bands for engineers delivering 100x impact with AI.

AI CODING HACK

How to ask Codex a quick side question mid-task

Every time you ask Codex a quick question mid-task, it gets buried in the main thread. Long sessions get messy fast. OpenAI just rolled out a fix: /side. Now, in any active session, you can just type your tangent after the command:

/side Does this migration plan have an obvious risk?

Codex spins up a fresh thread with your full transcript context, keeping the parent thread clean.

Use it to check risks or debug errors without losing your spot. Close it when you're done, or use /fork if you want to take that new direction permanently.

P.S. Get 50+ AI coding hacks for Claude Code, Cursor, and Codex here.

TOP & TRENDING RESOURCES

Click here to watch the tutorial.

Top Tool

Drizz: It is an AI test automation tool for Android, iOS, and mobile web apps. It simplifies the entire process, from creating and running to maintaining end-to-end functional test cases, all with minimal manual effort.

Top Repo

Multica: This open-source platform transforms coding agents into true teammates. It allows you to assign tasks, track progress, and build collective skills, making it easy to manage your entire workforce of humans and agents in one place.

Trending Cookbook

The unreasonable effectiveness of HTML (by Anthropic): As AI agents take on more complex tasks, traditional Markdown outputs can start to feel limited and hard to read. By contrast, generating interactive HTML files makes information much more engaging, visually organized, and easy to share. You’ll learn how to use HTML to produce richer, more readable, and easily shareable outputs.

Our most-clicked story from yesterday

OpenAI quietly dropped a PDF on how their engineers use Codex every day, with workflows you can adopt for your own team.

Grow customers & revenue: Join companies like Google, IBM, and Datadog. Showcase your product to our 290K+ engineers and 150K+ followers on socials. Get in touch.

What did you think of today's newsletter?

Your feedback helps us create better emails for you!

You can also reply directly to this email if you have suggestions, feedback, or questions.

Until next time — The Code team