YCombinator drops Paxel, OpenAI turns ChatGPT into a coding “super app”

Welcome back. Last week’s biggest shift in how the best engineers at the top labs are coding: they’ve moved from prompting coding agents to creating loops that handle the prompting for them.

Over the weekend, devs at Anthropic and OpenAI shared how they've started using recursive loops instead of constantly babysitting their coding agents. See how one OpenAI engineer is running these autonomous scripts directly inside the Codex App.

Also: How to fine-tune a 12B model on just 8GB, run Opus autonomously for days, and why Thinking Machines CEO Mira Murati is done with turn-based AI.

Today’s Insights

Powerful new updates and hacks for devs
Why a QA engineer is the most valuable agent
How to stop Claude from shipping security flaws
Trending social posts, top repos, and more

TODAY IN PROGRAMMING

Click here to watch YCombinator's Paxel in action.

New YC tool profiles how you code with AI: The Silicon Valley accelerator just dropped Paxel. It analyzes your sessions in Claude, Codex, and Cursor to show how you actually work. It scores your steering, execution, and planning, then assigns you an archetype like Architect or Night Owl. The analysis runs in Docker, and YC says raw code stays local while session excerpts get sent off for scoring. Founders can attach results to Startup School applications. Try it here.

OpenAI reportedly rebuilds ChatGPT around coding agents: The ChatGPT Maker is reportedly weeks away from a revamped version of ChatGPT that combines coding tools and AI agents into one place. The goal is to nudge casual users toward paid products like Codex. According to the Financial Times, this is a move to win over Anthropic's business customers before an IPO. One senior employee said it clearly: "Chat is dead."

ChatGPT's new mode blocks prompt-injection data theft: The ChatGPT maker just rolled out Lockdown Mode, an optional setting for accounts handling sensitive data. It can't stop a prompt injection from landing. Instead, it seals the exits, cutting the outbound requests attackers use to siphon out data. The catch is steep, since it switches off agent mode and deep research while blocking file downloads. It's now available across every plan.

PRESENTED BY IBM

Three-day Java modernization: a true story

Modernizing a Java 11 client-facing app often takes weeks, especially when it already serves tens of thousands of users. Blue Pearl’s application had evolved over time and was ready for an update to address deprecated APIs, ageing dependencies, and limited automated testing.

Using IBM® Bob inside the IDE, the team combined analysis, refactoring and test creation into a focused three day effort. The platform moved to Java 21 LTS with 92% test coverage, zero CVEbearing dependencies and no production incidents. JVM updates also delivered estimated 15% performance gain.

Read the full story

INSIGHT

The QA engineer is the most valuable agent you're not running.

Source: The Code, Superhuman

Shipping blind. AI now handles nearly a third of new code at Microsoft and Google. This pace is faster than engineering teams can actually review. Standard automated tests are necessary, but they have limits. Even 100% line coverage doesn't account for every real-world scenario. Manual QA used to fill these gaps, but it’s usually the first thing cut when deadlines get tight. This creates a major bottleneck. We are using AI to build faster, but we're losing ground on structural quality.

The automation trap. When software quality drops, leaders usually demand more automated tests. But trying to script every edge case is a trap. It creates a mountain of brittle code that is hard to maintain. Redis creator Salvatore Sanfilippo ran into this exact wall. The toughest QA tasks like catching performance issues or UX flaws are moving targets. They require human judgment. Rigid scripts just can't do that work.

Enter the QA agent. You don't need to replace your current tests. Instead, add agents on top of them. This automates deep, human-like testing. In Sanfilippo’s experiments, an LLM agent reads new commits. It analyzes the impact and runs a custom QA pass for that specific release. For Redis Arrays, he had the agent stand up an environment with replication and persistence, then simulate days of multi-user traffic to surface anything that looked off. This acts as a safety net. The agent tests like a real user. It flags anything that feels broken, undocumented, or sloppy.

Steal the setup. Keep your existing unit tests, but manage high-level QA with a single Markdown file. Use Sanfilippo’s process: write your goals and SSH details in plain English. Tell the agent to compare your new branch against the last stable release. This forces it to focus only on new changes. Finally, ask for relative checks, like flagging speed drops. This keeps your QA fresh without using hardcoded limits.

IN THE KNOW

What’s trending on socials and headlines

Meme of the day.

Local Fine-Tune: This guide fine-tunes Google's new Gemma 4 12B on your own data, running fully locally on just 8GB (6.5K bookmarks).
Marathon Mode: Claude Code's creator shared 5 tips for running Opus autonomously for hours or even days at a time (2.1K likes).
Harness Engineering: Most agent failures trace back to the harness and not the model. This breakdown shows how top teams build one that ships production code.
The New Moat: Now that anyone can build apps, building stopped being the hard part. This founder unveils the three things that actually decide whether your app gets traction (1.1M views).
Operator Mode: Tired of your Hermes Agent asking permission for every action? This SOUL.md template fixes it.
Read Before Shipping: A senior dev approved a PR by pasting the diff into a chatbot, then shipped a race condition to prod because the AI said it was thread safe.
Always-On AI: Thinking Machines CEO Mira Murati makes the case for why turn-based models feel clunky and what continuous, time-based AI changes about how you build (1.3M views).

AI CODING HACK

How to stop Claude Code from shipping vulnerabilities

Claude Code is fast, but it leaves security holes like injection or hardcoded secrets that you don't catch until PR review, if at all. Anthropic's security plugin reviews Claude's edits in real time and sends vulnerabilities back for an immediate fix.

To start, install it from the official marketplace and reload your session.

/plugin install security-guidance@claude-plugins-official
/reload-plugins

Set the prompt scope so it loads every time. Once it’s running, every file write triggers a scan for risky code. When a turn ends, a model double-checks the diffs.

High-severity issues are fed back to Claude for immediate fixes. On Git commit, an agentic reviewer traces data flow to catch complex bugs like IDOR or cross-file SSRF.

P.S. Get 50+ AI coding hacks for Claude Code, Cursor, and Codex here.

TOP & TRENDING RESOURCES

Click here to watch the tutorial.

Top Tool

Agent Mode in Arena: Run autonomous agents that browse, research, code, and handle multi-step workflows from one prompt. Every run is tracked on the Agent Arena Leaderboard, ranking the world's best models by how they actually perform in the field.

Top Repo

/last30days (32.8K ⭐): Research skill that combines recent, community-ranked signals from various platforms into one clear brief. It searches Reddit, X, YouTube, TikTok, Hacker News, GitHub, and the web to summarize the key events from the last 30 days.

Trending Cookbook

SchemaFlow (by OpenAI): Traditional database schema changes often lead to hidden errors and lost context when handed off between teams. This cookbook addresses this by using a staged, multi-agent workflow with deterministic guardrails to safely automate and validate SQL generation.

Our most-clicked story from Friday

Check out this agent skill, which was inspired by an internal prompt a Claude Code engineer used to get a better understanding of complex coding sessions.

Grow customers & revenue: Join companies like Google, IBM, and Datadog. Showcase your product to our 300K+ engineers and 150K+ followers on socials. Get in touch.

What did you think of today's newsletter?

Your feedback helps us create better emails for you!

You can also reply directly to this email if you have suggestions, feedback, or questions.

Until next time — The Code team