Vercel drops EVE, GitHub ships a desktop home for coding agents

Welcome back. Back in late February, Block cut 4,000 jobs and credited the move to AI. Now we've seen the proof. Their internal agent system is already merging 1,500 PRs a week and handling 15% of all code changes company-wide. And Block just shared the full breakdown of exactly how they're changing the way they build.

Also: How to run agents while you sleep, what a former Meta engineer hands to AI, and which company the mind behind the Transformer just left Google for.

Today’s Insights

Powerful new updates and hacks for devs
How to measure the impact of AI-assisted engineering
How to make Claude Code write useful plans
Trending social posts, top repos, and more

TODAY IN PROGRAMMING

Click here to watch how Vercel’s Eve works.

Vercel open sources a framework for production agents: The cloud hosting platform just dropped eve, which treats an agent as a simple directory of files instead of a complex web of custom code. It comes with built-in durable execution, sandboxes, human approvals, and evals. The framework works with any model and any MCP server, and the same agent runs across Slack, Discord, and GitHub. More than a hundred agents already run it in production.

OpenAI's Codex runs on any open model: The ChatGPT maker's coding agent supports more than just its own models, a feature that resurfaced this week. You can use an open-source flag to route Codex through Ollama or LM Studio on your local machine, or even connect a custom provider like Azure via your config file. This allows you to keep proprietary code in-house and avoid per-token costs.

GitHub ships a desktop home for its coding agents: The code-hosting giant just made its Copilot app generally available on Mac, Windows, and Linux. Developers can now start a session from an issue, PR, or prompt, and even run multiple sessions in parallel, each on its own branch. The new canvases put you and the agent on the same plan, terminal, or browser, so progress stays steerable. Plus, you can also schedule recurring work in the cloud and bring your own model.

PRESENTED BY VANTA

The enterprise deal is ready. Is your SOC 2?

The fastest-growing startups know that when customers ask for a SOC 2 report, they need to have it ready to go.

That's where Vanta comes in. Used by over 16,000 global companies like Ramp, Cursor, and Harvey, Vanta helps you get audit-ready quickly — and stay that way.

The Vanta Agent runs in your background like a 24/7 GRC engineer, pulling evidence, drafting fixes, and answering questionnaires.

Don't let compliance be the reason a deal slips. Watch the on-demand demo to see how Vanta works.

See it now

INSIGHT

Lines of Code is a vanity metric. Here’s how top engineering teams measure productivity:

Source: The Code, Superhuman

The unmeasured bill. Engineering teams are burning tokens without seeing real value. This happens because AI output is notoriously hard to track. After analyzing thousands of sessions, the Devin team at Cognition built a system to measure actual results. Now, they’re sharing their framework.

The activity trap. Tracking tokens and commits is easy but misleading. Cognition learned that code volume doesn't equal effort. A complex two-line fix can take all day. Meanwhile, a basic refactor creates thousands of lines in minutes. Measuring output just shows activity. It does not show how much work the agent actually took off your plate.

Measure in hours. Their solution shifts the focus to a simple question. How long would a human have taken to do this? Cognition uses an agent to review every session. It analyzes the full process rather than just the final code. They account for the investigation and the dead ends that never make it into a pull request. This allows them to measure the true engineering effort being saved.

Use the setup. You don't need to build Cognition's estimator to apply this. Ask your developers what AI saved them each week, then check that against your delivery data. From there, DX, a developer-productivity research firm, suggests in their guide pairing that output metric with a quality one, like change failure rate. This tells you more than how much was done. It tells you whether it was done well. And stop reporting lines of AI-written code. That only proves the tool was used. It does not prove it provided value.

PRESENTED BY CODEWALNUT

Are you still coding like it's 2022? Here's your guide to agentic coding:

AI-generated code is everywhere.

But engineers with agentic coding skills are in short supply.

You are no longer paid to write code. You are in demand if you can command AI agents.

Join a 90-minute hands-on masterclass on agentic coding. See how to build software without writing a single line of code.

Don’t let juniors with Cursor outpace your engineering experience.

Reserve your seat here ->

IN THE KNOW

What’s trending on socials and headlines

Meme of the day.

Loop Logic: The creator of Claude Code doesn't prompt Claude anymore. This guide shows how to build loops that prompt your agents while you sleep (1.5M views).
Agent Playbook: Prompt engineering is the easy part. This thread breaks down the skill that keeps agents from failing on step 20.
Grunt Work: A former Meta staff engineer offloads his most tedious work to AI. Here's the unglamorous chore list that keeps a codebase from rotting.
Plan Mode Upgrade: This open-source skill turns Claude Code's wall-of-text plans into wireframes you can mark up before the agent touches anything.
Voice Coder Fix: If you dictate code with tools like Wispr Flow, this one line in your CLAUDE.md tells the agent to read past transcription errors.
Caught Cheating: This site created by an AI engineer at Galileo shows you how recursive self-improvement loops work and how they break.
Shazeer Moves: One of the minds behind the architecture inside every major LLM is leaving Google for OpenAI. The talent war isn't cooling off (4.7M views).
Switch Flip: The U.S. just blocked Anthropic's newest models from going abroad, and world leaders at the G7 are spooked that American AI could get shut off on them at any moment.

AI CODING HACK

How to make Claude Code write useful plans

Claude Code implementation plans often end up as massive Markdown files. Nobody actually reads them. This breaks the human review process because the output is just too clunky. Thariq Shihipar from Anthropic solved this by ditching Markdown for HTML. Markdown is limited to basic text and ASCII.

However, HTML allows for actual tables, SVG diagrams, and real mockups. You can open these right in your browser. To fix your workflow, just tell your planning prompt to add this one line:

Create a thorough implementation plan in a HTML file, be sure to make some mockups, show data flow and add important code snippets I might want to review. Make it easy to read and digest.

Claude writes the plan as a styled page rather than a wall of text. You can open it, scan the diagrams, and review relevant snippets instantly. Share the file with a link instead of a bulky attachment.

This approach also works for PR reviews, incident reports, and design exploration. Start with the planning prompt, then pull from Thariq's template gallery as needed.

P.S. Get 50+ AI coding hacks for Claude Code, Cursor, and Codex here.

TOP & TRENDING RESOURCES

Click here to watch the tutorial.

Top Tool

PandaProbe Cloud: An open-source agent engineering platform built for deep observability. It lets you trace, evaluate, monitor, and debug your AI agents across both development and production.

Top Repo

Skills For Real Engineers (134K ⭐): Software engineering fundamentals matter more than ever. This repo is an ex-Vercel engineer's best effort at condensing these basics into repeatable practices, helping you ship the best apps of your career.

Trending Paper

How models may behave in real-world use before release (by OpenAI): Traditional safety tests struggle to predict real-world AI behavior because models often realize they are being evaluated. Replaying past user conversations to simulate deployments fixes this, vastly improving risk forecasts and catching hidden flaws before release.

IN CASE YOU MISSED IT

Our most-clicked story from yesterday

Developers are stockpiling local coding models that run entirely offline. Here is the best one to download before you need it.

Grow customers & revenue: Join companies like Google, IBM, and Datadog. Showcase your product to our 300K+ engineers and 150K+ followers on socials. Get in touch.

What did you think of today's newsletter?

Your feedback helps us create better emails for you!

You can also reply directly to this email if you have suggestions, feedback, or questions.

Until next time — The Code team