Castform drops its RL training platform, OpenAI unveils dev mode in Codex

Welcome back. Coding tasks that would cost thousands on an API can run for weeks on a $200 subscription. A viral report just compared every tier of Claude and ChatGPT to see how they stack up — and OpenAI's top plan turned out nearly twice as generous as Anthropic's.

You'd think numbers like these would force prices up. OpenAI is reportedly about to do the opposite.

Also: Cut your Claude Code token usage in half, OpenClaw creator's loop engineering setup, and which resumes actually land a job at Anthropic.

Help us improve: what was your favorite part of today's issue? Click your pick in the poll below.

Today’s Insights

Powerful new updates and hacks for devs
An AI researcher breaks down how to build a vertical agent
How to stop Opus from draining your token usage
Trending social posts, top repos, and more

TODAY IN PROGRAMMING

Click here to watch Castform in action.

Castform enables every team to train their own models: The SF-based startup just dropped its RL training platform in open beta. They are betting that the old advice to never train a custom model is outdated. With Castform, teams simply define what a good output means to them. The platform then handles data prep, GPU infrastructure, and algorithm tuning. They claim task-trained open models close the gap with closed APIs at a fraction of the cost. See an example setup to train a RAG agent.

OpenAI gives Codex the keys to Chrome's debugger: The ChatGPT maker just rolled out Developer Mode for browser use in Chrome and the Codex in-app browser. It gives the coding agent access to the Chrome DevTools Protocol. This lets it profile JavaScript performance and inspect console output, network traffic, and page state to track down bugs. Since this involves touching sensitive browser data, Codex asks for explicit approval before inspecting a site.

Anthropic's CEO demands mandatory safety tests for AI: Dario Amodei published Policy on the AI Exponential, arguing the technology is advancing far faster than the rules meant to govern it. He says transparency is no longer enough and proposes mandatory third-party safety testing for frontier models, with governments empowered to block releases that fail. The essay maps five policy areas needing a rethink, from job displacement to civil liberties. Read the full essay.

PRESENTED BY AWS

AI and Cloud Innovation Starts in DC

AWS Summit DC brings together developers, architects, and public-sector leaders for two days of technical sessions, hands-on demos, and real-world insights into AI, security, and modernization. No cost to attend.

Register now for June 30–July 1.

INSIGHT

How to build a good vertical agent: Lessons from an AI researcher

Source: The Code, Superhuman

The secret is out. This year, the recipe for building AI agents went public. It is basically just a while loop where a model calls tools until the job is finished. You can build one in an afternoon. But that only gets you a generic agent. The real prize is a vertical agent. These are built for one specific field. Customers choose them because they are more accurate. A founding scientist at Fundamental Research Labs, Peter Wang, just dropped a playbook on how to build a high-quality one.

A year of getting it right. Wang spent the last year building a spreadsheet agent for some of the world's biggest hedge funds. In that world, one wrong number can cost a lot of money. His first design choice was unusual. He gave the agent one tool instead of thirty. The agent simply writes code, and that code handles all the reading and writing. Having fewer tools means fewer chances to mess up. Beyond that, accuracy is all about context. Give the model too much and it loses track of what matters. Give it too little and it starts to guess.

The fix comes from hardware. A processor keeps its most important data in a small, fast cache. Everything else sits in slower memory until it is actually needed. Wang organizes an agent's context the same way, sorting it by how often each task comes up. Daily work like reading and writing cells stays right in the prompt. Occasional jobs like pivot tables become short guides that the agent pulls up on demand. The really rare stuff lives in one big API file that the agent can search.

Apply it to your domain. Wang argues that this structure works for any project because you understand your users better than anyone else. Just ask yourself three questions:

What does your agent do all the time?
What does it do every once in a while?
What is the backup plan for everything else?

Put each task in the right spot to keep the agent fast and accurate. The full playbook is definitely worth reading.

IN THE KNOW

What’s trending on socials and headlines

Meme of the day.

Limit Loophole: Fable 5 is so overpowered you shouldn't be using it for every step. This workflow stretches your Claude Code limits (7K bookmarks).
Fast Track: An Atlassian Principal Engineer condensed his 12-year climb from backend dev into one playbook you can run at any company (19K bookmarks).
Loop Revealed: After loop engineering blew up, OpenClaw's creator posted his real setup, an agent that triages and ships work while he's away (7.4K bookmarks).
Half the Burn: This developer cut his weekly Claude Code token usage in half with one change to how his models split the work (2.8K bookmarks).
Zero Handwriting: Conductor's CEO broke down his AI coding setup in a new YC video. He writes no code by hand anymore, except in "caveman mode" (3K likes).
Parallel Power: This walkthrough teaches you to run 20 coding agents at once without them stepping on each other's code.
Hiring Decoded: An AI startup's Head of Talent ran the numbers on 1,680 Anthropic engineer resumes. The results say a lot about who gets hired there.

AI CODING HACK

How to stop Opus from draining your token usage

Opus writes better plans than Sonnet, but it can burn through your weekly quota in days. Anthropic’s documentation lists an alias that automates the trade-off most devs already do manually. Use Opus for thinking and Sonnet for typing.

Just use "/model opusplan" and press shift+tab to enter plan mode. Claude will use Opus for the planning phase. Once it starts writing code, it switches to Sonnet automatically. The context carries over, so Sonnet knows exactly what Opus decided. To make this permanent, start every session with this command:

claude --model opusplan

Codely, an AI education platform, compared the models in its own workflow and concluded Opus's edge is largest in planning and narrows during implementation.

P.S. Get 50+ AI coding hacks for Claude Code, Cursor, and Codex here.

TOP & TRENDING RESOURCES

Click here to watch the tutorial.

Top Tool

Nessie (Recommended by YC CEO): This tool makes every AI agent you use smarter. It turns all your chats and notes into a single searchable and shareable memory layer that generates reusable, project-ready context.

Top Repo

Agentsview (2K ⭐): Local-first session intelligence and analytics for coding agents. It lets you browse, search, and track costs across all your AI coding agents.

Trending Cookbook

Async multi-agent orchestration (by Anthropic): Managing communication between multiple AI agents can get tricky when you're dealing with complex task logic. This cookbook shows that effective orchestration actually boils down to just two straightforward asynchronous patterns: a fixed team and dynamic subagents.

Our most-clicked story from yesterday

This six-step process helps you start thinking like a senior developer, using a coding agent as your personal mentor.

Grow customers & revenue: Join companies like Google, IBM, and Datadog. Showcase your product to our 300K+ engineers and 150K+ followers on socials. Get in touch.

Which was your favourite section in this newsletter?

Your feedback helps us create better emails for you!

You can also reply directly to this email if you have suggestions, feedback, or questions.

Until next time — The Code team