Factory drops next version of coding agents, Cartesia ships Sonic-3.5

Welcome back. Call a paid plan "Max" and users will hold you to it. One customer just reportedly sued Anthropic for fraud, claiming his $200 plan delivers far less than advertised. The AI lab hasn't replied to the allegation yet, but according to SemiAnalysis, the Claude maker is reportedly losing up to $8000 on Max plans.

Also: Google Cloud AI Director's guide to agentic code review, the seven phases of an AI coding process, and why Anthropic quietly reversed its Agent SDK quota plan.

Today’s Insights

Powerful new updates and hacks for devs
Spec-driven development with coding agents
How to stop unused MCP servers from slowing Cursor
Trending social posts, top repos, and more

TODAY IN PROGRAMMING

Click here to watch Factory 2.0 in action.

Factory turns from coding agents to software factories: The agentic coding startup just unveiled Factory 2.0, which takes its Droids (agents) from coding assistants into full-scale development engines. Now, they handle everything from triaging bug reports to shipping the final code in one seamless loop, while a Router picks the most efficient model for each job. It’s already being used by NVIDIA, Adobe, and EY, and it’s shifting the engineer's role from writing code to building the systems that write it.

Cartesia ships two top-tier streaming models for voice agents: The voice AI startup dropped Sonic-3.5 and Ink-2, a speech model and a transcription model that both sit at the top of the Artificial Analysis leaderboards. Built on state-space models, the pair runs through a single API, and Cartesia says Sonic hits sub-90ms latency and Ink-2 handles transcription with built-in turn detection. It's a straightforward pitch: engineering teams can finally stop juggling different vendors for their real-time voice pipeline.

Security leaders protest the export ban on Anthropic's models: More than 150 researchers and executives just signed an open letter urging the US government to reverse controls on Claude Mythos and Fable. Their argument is that pulling the strongest bug-hunting tools away from defenders only helps attackers, since the same capability already shows up in rivals like GPT-5.5 and even Anthropic's own Opus. They want any AI rules grounded in open, scientific evaluation.

PRESENTED BY WISPR

Your agent is only as good as your prompt.

Claude Code, Cursor, Codex. The output ceiling is insane. But "fix the bug" gets you slop. Detailed context gets you shipping code.

The problem: typing out file references, edge cases, and architecture context 40 times a day is exhausting. So you cut corners. Your agent gets lazy input. Wispr Flow lets you speak full prompts with syntax-aware dictation. camelCase, snake_case, file names all preserved. Auto-tags files in Cursor and Windsurf.

Used by engineers at OpenAI, Vercel, and Clay. Works on Mac, Windows, iPhone, and Android. Better prompts in. Better code out.

Try free

INSIGHT

Silicon Valley's best teams stopped letting coding agents guess

Source: The Code, Superhuman

Guessing doesn't scale. For two years, teams shipped AI code the same way. You prompt the agent, it fills the gaps with a guess, and you clean up the rest. This worked for small tasks. But it failed as codebases grew, because the bigger the system, the more of what you want lives in details you never typed. One analysis found that 31 percent more PRs now merge with no review. The main issue was no longer how fast the agent writes code. It became whether anyone wrote down what the code was actually supposed to do.

Write the spec, then the code. When the output isn't quite right, most teams just try writing a longer prompt. That's a trap. Apoorv Gupta, a Principal Software Engineer at Microsoft, suggests a better move: align on the goals first, then let the AI work against a clear spec. You give it one solid resource that covers what to build, the constraints, and the success criteria. At that point, the code is just the output, while the spec becomes the actual document you maintain.

How to start this week. You don't need a new platform, just one loop per feature.

Install GitHub's Spec Kit and run its loop: constitution, specify, plan, tasks, implement, validate.
Write the constitution first (mission, stack, guardrails) so the agent loads the same rules each session.
On legacy code, draft the spec from your existing docs and cover only what you're changing. JetBrains' course runs it on a real repo.

Don't spec everything. Running the full lifecycle for small tasks is just too much. If you run it for every minor change, you'll spend more time writing documents than actually shipping code. Distinguished Engineer Birgitta Böckeler once saw a spec tool turn a simple bug fix into four user stories and sixteen criteria. Save that level of detail for work that needs to last or involves other team members. For a quick fix with a clear test, just prompt it and move on.

PRESENTED BY IBM

CEO + CAIO: the discordant duo

From “get our teams on board” to “how can our IT systems support all these agents and assets?” decentralized AI adoption has exposed a rift between CEO expectations and CAIO operational reality. But alignment and trust can grow by examining:

Whether the high-level infrastructure is too fragmented.
Who identifies which AI initiatives to pursue—and how.
How a hub-and-spoke AI framework could improve ROI.

Read how the C-suite can unite on a shared AI vision

IN THE KNOW

What’s trending on socials and headlines

Meme of the day.

Seven Phases: An ex-Vercel engineer breaks down his entire AI coding process into seven phases (3K bookmarks).
Set and Forget: Most devs prompt coding agents turn by turn. This guide shows how to wrap them in goals, loops, and verifiers so they keep working after you stop typing.
Review Crisis: AI now writes more code than anyone can review, and that pileup is the real bottleneck. Director at Google Cloud AI maps where the real work moved.
One Tab: OpenAI's new Codex plugin sets up your API keys, pulls the right docs, and debugs your code without leaving the editor.
Parallel Agents: Ghostty's creator shared the exact workflow he uses to run parallel coding agents on real feature work (2.5K likes).
Quota Walk-Back: Anthropic paused its plan to push Agent SDK and claude -p off subscription quotas. One developer reads it as a bigger strategy shift.
Off the Cloud: One ML engineer runs agentic coding entirely on a 2022 Mac, hitting ~75% of frontier accuracy. Her full local setup is here.

AI CODING HACK

How to stop unused MCP servers from slowing Cursor

Connecting several MCP servers in Cursor slows down every message because they load tool definitions into context automatically. One audit found that an idle server can add 17,600 tokens per message. Cursor 3.7 now makes this cost visible.

Click the context ring on your agent to see the Context Usage Report. It shows exactly where tokens are going across your rules, skills, and servers. Once you find the heavy ones, disable what you don't need. Use the Cursor CLI to turn off a server by name:

/mcp disable <server-name>

Replace the server name with the one from your report. You can also toggle servers in Settings under MCP. This reduces message size, giving the agent more room for your code and preventing it from hitting context limits.

P.S. Get 50+ AI coding hacks for Claude Code, Cursor, and Codex here.

TOP & TRENDING RESOURCES

Click here to watch the tutorial.

Top Tool

Omnigent: A meta-harness that sits on top of existing agent frameworks and LLM tools to give them a unified interface. It lets you combine different agents, enforce shared policies like security and cost limits, and collaborate on live agent sessions with your team.

Top Repo

optimizerDuck (3.9K ⭐): An open-source Windows optimizer that speeds up your PC, improves privacy, and strips away bloat with a clean, straightforward UI. It packs over 30 proven system tweaks and built-in tools like startup and bloatware managers, plus a one-click revert feature with automatic backups so you can experiment safely.

Trending Paper

Self-Harness - Harnesses that improve themselves: AI operating rules are usually hand-built by humans, causing bottlenecks as models rapidly evolve. Researchers found that letting AI analyze its own mistakes to automatically upgrade these rules drastically improves performance without human intervention.

IN CASE YOU MISSED IT

Our most-clicked story from yesterday

Check out this post, which maps out a 12-stage roadmap for building production-grade agentic AI in just six months.

Grow customers & revenue: Join companies like Google, IBM, and Datadog. Showcase your product to our 300K+ engineers and 150K+ followers on socials. Get in touch.

What did you think of today's newsletter?

Your feedback helps us create better emails for you!

You can also reply directly to this email if you have suggestions, feedback, or questions.

Until next time — The Code team