MiniMax drops M3, OpenAI staffs up its Robotics division

Welcome back. The gap between open-source and frontier AI is closing fast, and a Chinese lab just took a major leap forward. Their new open-weights model combines top-tier coding skills with a million-token context window.

Also: How to build a Claude Code subagent with ready-to-use templates, an OpenAI Codex engineer's "chief of staff" thread that runs every project, and why Wispr Flow's CTO stopped coding.

Today’s Insights

Powerful new updates and hacks for devs
See why cheaper AI models can quietly raise your real costs
How to audit an entire codebase in one Claude Code run
Trending social posts, top repos, and more

TODAY IN PROGRAMMING

Click here to see MiniMax M3’s full coding benchmarks.

MiniMax drops an open model that rivals the best: The Chinese AI lab just unveiled MiniMax M3, what it calls the first open-weight model to pair top-tier coding with a 1M-token context window and native image and video input. On the SWE-Bench Pro test, it reportedly beats GPT-5.5 and Gemini 3.1 Pro while closing in on Opus 4.7. The model can even operate a desktop computer. API access starts at $0.60 per million input tokens, and the weights are set to open within 10 days.

Nvidia ships an open model to power physical AI: The chip giant just released Cosmos 3, an open-weight foundation model built for robots and self-driving. It combines reasoning, world generation, and action prediction into a single system, so teams working on robotics or autonomous tech can stop juggling separate models and pipelines. The compact 8B version runs on a workstation GPU, while the 32B model targets datacenters. You can find the checkpoints, training scripts, and datasets on Hugging Face and GitHub.

OpenAI staffs up to build real-world robots: The ChatGPT maker just kicked off a hiring drive for its robotics division, scouting full-stack hardware, systems, and ML engineers to build machines for the physical world. The team grew out of an internal world-simulation project over the past year. CEO Sam Altman credits that hardware-and-ML pairing for how fast progress is now moving. Its first priority: robots that help skilled workers build out infrastructure.

PRESENTED BY IBM

How agentic engineering differs from vibe coding

Developers may use AI for quick code generation, but this “vibe coding” approach can create more problems than it solves. Agentic engineering uses AI agents for specific tasks like refactoring, documentation, and reviews. The key difference: developers maintain oversight and validate outputs through multi-agent systems.

This means fast development without sacrificing code quality or accumulating technical debt.

Dive deeper

INSIGHT

Cheaper AI models won’t help you cut your AI costs. Here’s why:

Source: The Code, Superhuman

Budgets are hitting a wall by March. AI coding bills are blowing past their yearly limits, and one CTO went viral after burning through his entire 2026 budget by mid-March. So the fix spread fast. Teams started capping spend and pushing the easy work to cheaper models. On the surface, it looks like simple housekeeping.

The math for cheap models looks great. A top-tier model like Opus can cost ten times as much as a budget model for the same request. Across thousands of daily tasks, that gap adds up fast, so the savings feel impossible to pass up. The cheaper model still writes code, so why pay the premium?

The invoice hides the real cost. What matters is the cost per change that actually ships, an accepted PR or a merged fix. In Cursor's own data, Opus and GPT-5.5 land about even. Opus still costs twice as much per request. The budget model stays cheaper, but its lead shrinks by about half once you count code that survives review. It comes down to acceptance. Cheaper models get more of their output rejected, and each rejected change becomes rework. And the pricey part of rework, your engineers' time re-reviewing and re-prompting, never lands on the bill.

The invoice isn't the scoreboard. The real question is simple: What did that spend actually ship? Cutting costs before you can answer that is just guessing. This guide to managing AI coding spend makes the case for a better move. Tie the bill to what your team ships, not to how many tokens it burns. Give engineers room to work and flag the rare runaway. The team that wins is the one that can prove what its spend bought.

IN THE KNOW

What’s trending on socials and headlines

Meme of the day.

Off the Leash: Cursor's new update lets agents run more tool calls without stopping to ask you (2K likes).
Subagent Starter: This post gets you a working Claude Code subagent in 15 minutes, with 5 copy-paste templates for code review, tests, and more.
One Thread to Rule: A Codex engineer at OpenAI runs everything through one "chief of staff" thread that spins up and monitors all his other project threads. Here's the workflow.
The Serial Lock: Addy Osmani, Director at Google Cloud AI, breaks down why spawning more agents leaves you busy but barely more productive (4.8K bookmarks).
Shipped Quietly: Buried in the Opus 4.8 launch is a Claude Code feature that turns a single task into hundreds of self-coordinating agents. Understand how it works.
Set and Forget: OpenClaw's creator turned Codex into a QA bot that tests every commit and opens pull requests with the fixes by itself.
The New Job: Wispr Flow's CTO hasn't written a line of code since December, with AI now shipping 90% of what serves 1M+ users. See what his job became.

AI CODING HACK

How to audit your whole codebase in one Claude Code run

Once the context window fills up on a massive audit, Claude starts dropping details, and you are left with a report that is hard to trust. Anthropic solved this with a new feature that moves the execution entirely outside the context window.

Now, Claude writes a JavaScript orchestration script that fans tasks out across hundreds of parallel subagents. It stores every intermediate result in script variables rather than the conversation, returning only the verified final answer.

It's on by default on Max and Team and needs Claude Code v2.1.154 or later. Turn on auto mode, then describe the audit and ask Claude to build a workflow:

Create a workflow that audits every endpoint under src/routes/ for missing auth checks and report what you find without changing any code.

Claude shows the plan and asks for confirmation before running. It's a good idea to scope your first one to a single folder, since a workflow burns through way more tokens than a regular session.

P.S. Get 50+ AI coding hacks for Claude Code, Cursor, and Codex here.

TOP & TRENDING RESOURCES

Click here to watch the tutorial.

Top Tool

Phasr: A workspace orchestration platform for engineers that manages dozens of parallel coding workflows and repositories simultaneously, allowing you to debug, review, and scale development without ever losing context.

Top Repo

SkillSpector (by Nvidia): A security scanner built for AI agent skills. It flags vulnerabilities, malicious patterns, and potential security risks before you ever hit install.

Trending Cookbook

Getting started with OpenAI Models on Amazon Bedrock: This guide gives developers a blueprint for deploying OpenAI models on Amazon Bedrock within complex production environments. You'll discover how to leverage the standard OpenAI SDK to handle everything from text generation and structured JSON to tool calling and file inputs seamlessly.

Our most-clicked story from Friday

OpenAI’s Codex engineering lead has a warning for devs over-relying on AI agents. This clip shares the one critical step he says you just can’t hand off.

Grow customers & revenue: Join companies like Google, IBM, and Datadog. Showcase your product to our 300K+ engineers and 150K+ followers on socials. Get in touch.

What did you think of today's newsletter?

Your feedback helps us create better emails for you!

You can also reply directly to this email if you have suggestions, feedback, or questions.

Until next time — The Code team