Cursor drops Composer 2.5, Cognition unveils Devin Auto-Triage

Welcome back. Every tech boom has winners and losers, but AI is polarizing Silicon Valley like never before. A renowned VC in a viral post with 12 million views breaks down this divide, showing that even those who "made it" aren't fine. Read the full perspective here.

Also: Ex-Google engineer's $400K remote job blueprint, the 7 untapped side-project ideas from Notion's former Head of Community, and why Karpathy says he's never felt more behind.

Today’s Insights

Powerful new updates and hacks for devs
What on the earth is Context Pruning
How to catch Claude Code's spec drift
Trending social posts, top repos, and more

TODAY IN PROGRAMMING

Check out how Composer 2.5 performs vs other frontier models.

Cursor ships its most capable coding model yet: The AI coding startup just dropped Composer 2.5, a model designed for long-running tasks that handles complex instructions way more reliably than the previous version. It goes toe-to-toe with Opus 4.7 and GPT-5.5 on coding benchmarks, while costing just $0.50 per million input tokens, a fraction of the price. The team is also partnering with xAI to train an even bigger model from scratch on Colossus 2.

Cloudflare red-teams its own code with Anthropic's Mythos: The web infrastructure company just released results from Project Glasswing, where it tested Mythos against 50 of its own code repositories. The model was able to chain minor bugs into major security holes, even writing working proof-of-concept code. Cloudflare's main takeaway is that simply patching faster isn't the answer. Instead, engineering teams need to build resilient architectures that can actually withstand the next inevitable exploit.

Cognition ships an always-on bug triage agent: The SF-based startup just shipped Auto-Triage, a persistent agent that monitors your Slack channels and investigates bugs as soon as they're reported. A parent Devin filters out the noise before spinning up focused sub-sessions to find root causes, post diagnoses, and tags the appropriate code owner. With a shared long-term memory, it can deduplicate repeat reports and learn the team’s ownership map.

PRESENTED BY WISPR

Cursor for code. Claude for thinking. What about input?

Your dev stack got an AI upgrade everywhere except the input layer. You're still typing every prompt, every ticket, every review comment by hand.

Wispr Flow closes that gap. Dictate into Cursor, VS Code, Slack, Linear, or anywhere else you work. It's syntax-aware: camelCase, snake_case, acronyms, and file names all come through clean. Mention a file in Cursor or Windsurf, and it auto-tags.

It's the voice layer for an AI-native workflow. Speak your intent. Your tools do the rest.

Available on Mac, Windows, iPhone, and Android. Used by millions of developers, including teams at OpenAI and Mercury.

Try free

INSIGHT

What on the earth is Context Pruning

Source: The Code, Superhuman

Context windows keep growing. Every model release this year has pushed limits higher, and teams have responded by stuffing prompts with retrieved passages, chat histories, and boilerplate. The bet was that more context meant better answers. Instead, costs are climbing, latency is creeping, and output quality is dropping. Bigger windows haven't fixed the problem they were supposed to solve.

Lost in the middle. LLMs ignore information buried in the middle of long prompts, and length alone tanks performance regardless of what's in it. A recent study found a model losing nearly 70 points of accuracy on a standard knowledge benchmark just because researchers padded the prompt with filler tokens. Million-token windows have effective lengths far shorter than the marketing claims.

Context pruning is the solution. This technique works by scoring every bit of input (tokens, sentences, or chunks) and tossing out the low-value parts before the model even gets to them. Teams running RAG are using it to trim the same bloated passages and chat histories that clutter their prompts. It cuts costs, lowers latency, and usually leads to better results because the model isn't struggling to find the signal in the noise.

Code and chats break differently. Trimming individual tokens is risky because it can break code structure or disrupt the flow of a conversation. Chunk-level methods, which keep entire functions or sentences intact, are much more effective. This hands-on guide walks through which technique best fits your workload.

IN THE KNOW

What’s trending on socials and headlines

Meme of the day.

Remote Roadmap: An ex-Google engineer just shared the 5-part system she'd use to land a $400K remote software job from scratch today.
Power Mode: Most developers run Claude Code like a smarter ChatGPT. These 12 setup tricks unlock a full AI engineering environment.
Inside Track: Google DeepMind's Gemini Pre-training Lead just dropped the blueprint to land a job at a top AI lab (1M views).
Code Standards: This viral GitHub repo extracts the HTML, CSS, and JS best practices every dev should add to their library (9K stars, 2.9K bookmarks).
Monorepo Mode: Anthropic just published the patterns top teams use to deploy Claude Code across million-line monorepos and decades-old legacy systems.
Karpathy Confession: A year after coining "vibe coding," Andrej Karpathy admits he's never felt more behind. The AI code he over-trusted now gives him "heart attacks." One dev breaks it down.
Agent Stack: This thread breaks down the 12 integrations that turn Hermes into a real teammate across code, comms, and revenue.
Side Quests: Notion's former Head of Community just shared 7 untapped side-project ideas you can start building today.

AI CODING HACK

How to catch Claude Code's spec drift

Claude Code makes dozens of decisions when implementing a spec, from resolving ambiguities to picking tradeoffs. Since these don't show up in the diff, you often only spot issues when things break in review or production.

An engineer on Anthropic's Claude Code team shared a prompt that forces the tool to log every decision to a file as it works. Just append this to any implementation request:

Implement <SPEC>. As you work maintain a running implementation-notes.html file that captures anything I should know about how the implementation diverges from or interprets the spec, including:

- Design decisions: choices you made where the spec was ambiguous
- Deviations: places where you intentionally departed from the spec, and why
- Tradeoffs:  alternatives you considered and why you picked what you did
- Open questions: anything you'd want me to confirm or revise

By reading the file once the task is finished, you'll know exactly which decisions were made and why before you even dive into the code.

P.S. You can find 50+ AI coding hacks here.

TOP & TRENDING RESOURCES

Click here to watch the tutorial.

Top Tool

DesignMD: This tool analyzes live websites to extract structured design intelligence like typography, spacing, and colors. It lets you paste any URL to instantly generate actionable design system insights and AI-ready prompts.

Top Repo

Browse.sh (by Browserbase): A browser CLI for your agents. It provides a single interface for skills, browser primitives, debugging, and cloud sessions, all built specifically for agent-driven workflows.

Trending Paper

How to use /goal in Codex (by OpenAI): Standard prompts often fail at long tasks because you have to keep repeating the end goal after every step. Codex solves this with /goals, which keeps the model focused on a single objective until it is finished. This cookbook covers when to use them, how they change the workflow, and how to write clear goals with specific constraints and success criteria.

Grow customers & revenue: Join companies like Google, IBM, and Datadog. Showcase your product to our 280K+ engineers and 150K+ followers on socials. Get in touch.

What did you think of today's newsletter?

Your feedback helps us create better emails for you!

You can also reply directly to this email if you have suggestions, feedback, or questions.

Until next time — The Code team