OpenAI ships Codex Security, Anthropic adds scheduled tasks

Welcome back. Two weeks ago, Anthropic shipped security scanning into Claude Code and set a new bar for what a coding agent should be able to catch. OpenAI just came back with its own answer: Codex Security.

Also: The exact 6-month roadmap to become an AI engineer, what to check before giving agents production access, and which coding habits are costing you.

Today’s Insights

Powerful new updates and hacks for devs
Karpathy just turned one GPU into a research lab
How to run tasks on a schedule in Claude Code
Trending social posts, top repos, and more

TODAY IN PROGRAMMING

Click here to see Codex Security in action.

OpenAI ships Codex Security to catch real vulnerabilities: The ChatGPT maker just released Codex Security, an agentic tool that scans repositories, builds threat models, and proposes context-aware patches for real vulnerabilities. Unlike standard scanners that often overwhelm teams with false positives, the tool uses sandboxed validation to pressure-test its findings. But that isn’t the only thing: OpenAI also launched Codex for Open Source, giving core maintainers free API credits and six months of ChatGPT Pro.

Anthropic adds background automation and a new app marketplace: The AI lab shipped local scheduled tasks in Claude Code Desktop, letting developers automate recurring jobs like dependency audits and PR triage that run while the machine is awake. Separately, it launched Claude Marketplace, a one-stop shop for enterprises to procure Claude-powered tools from partners including GitLab, Replit, and Snowflake.

Luma's new model brings reasoning into image generation: The AI video startup launched Uni-1, a unified model that understands and generates images in a single pass rather than treating them as separate tasks. It leads the RISEBench leaderboard overall, topping both Nano Banana 2 and GPT Image 1.5. Powered by Uni-1, Luma Agents are now available via API for end-to-end creative workflows.

INSIGHT

Karpathy just turned one GPU into a research lab

Source: The Code, Superhuman.

The experimentation wall just came down. ML research has always been gated by one thing: the speed at which you can run experiments. Over the weekend, OpenAI cofounder Andrej Karpathy cracked that wall open with the release of autoresearch - a repo that puts an AI agent in charge of your ML research loop, automatically rewriting training code and only keeping changes that improve the model.

The proof came before the hype. Shopify CEO Tobi Lutke ran it before any ML team. He pointed the agent at his query-expansion model, went to bed, and woke up to 37 completed experiments — and a 0.8b parameter model beating his previous 1.6b by 19%.

You can run this yourself. Simply clone the repo, upload your training data, and create a “program.md” file outlining your research objectives. Once you point the agent to your GPU, it takes over the rest: editing your Python training scripts, running 5-minute experiments, tracking validation loss, and discarding failures automatically. To help you get up and running, here is a deployment guide.

GitHub wasn't built for what comes next. Karpathy's next step is SETI@home-scale collaboration - agents running asynchronously across thousands of branches on shared compute, and he's already flagged that existing tooling will crack under the load. The moat has shifted from who employs the most researchers to who writes the best instructions for agents that never stop.

IN THE KNOW

What’s trending on socials and headlines

Meme of the day.

Plausible ≠ Correct: LLM code can clear every test you throw at it and still be catastrophically wrong in production. This breakdown shows exactly how it happens.
AI Chief of Staff: A solo consultant used Claude Code to automate 195 hours of annual busywork into a two-button morning routine and documented the whole build (3.5M views).
Reality Check: Most devs are confident going into system design interviews. These 20+ problems are a reality check.
$32K Weekend: A developer was paying $32,000/year for a SaaS tool, decided to rebuild the whole thing in a weekend with Python and Claude, and posted the full code.
Zero to Shipped: AI agents are the most in-demand dev skill of 2026. This is the exact 6-month plan engineers are saving to get there (1M views).
Senior Dev Mode: Drop this prompt into Claude Code and watch the way it plans, thinks through problems, and self-corrects change entirely.
Hard Lesson: A dev gave Claude Code root access to his AWS setup. It followed instructions perfectly and wiped 2.5 years of database records. Must-read before giving agents production access (2.9M views).
Skill Decay: Anthropic studied how developers use AI coding tools and found 3 patterns that build skills and 3 that quietly destroy them. The ones that destroy are the defaults (2.1M views).

AI CODING HACK

How to run tasks on a schedule in Claude Code

Claude Code just launched scheduled tasks, giving you the ability to run prompts on repeat in the background while your session stays active. This is an upgrade if you've ever started a deployment and spent the next hour constantly tabbing back to monitor its progress.

The easiest way to get started is by using “/loop”. Just provide a specific interval and your prompt:

/loop 5m check if the deployment is healthy
/loop 15m scan my error logs and flag anything new
/loop 30m check if CI passed on main

If you skip the interval, it defaults to every 10 minutes. For one-off reminders, you can skip /loop entirely and simply describe what you need:

remind me at 3pm to push the release branch

Tasks only run while Claude Code is active; if you close the session, everything stops. To keep them running continuously, you can leave a session open on a server or use an external cron job to launch new Claude Code sessions on a schedule.

TOP & TRENDING RESOURCES

Click here to watch this tutorial.

Top Repo

AI engineering field guide: A collection of real job postings, interview processes, take-home assignments, learning paths, and portfolio ideas for AI engineers.

Trending Paper

Evaluating Opus 4.6 on BrowseComp (by Anthropic): Public AI benchmarks are often compromised because test answers frequently leak online. In a surprising turn of events, Claude 4.6 Opus independently recognized it was being tested, tracked down the specific benchmark it was taking, and decrypted the hidden answer key.

Grow customers & revenue: Join companies like Google, IBM, and Datadog. Showcase your product to our 200K+ engineers and 100K+ followers on socials. Get in touch.

Whenever you’re ready to take the next step

What did you think of today's newsletter?

Your feedback helps us create better emails for you!

You can also reply directly to this email if you have suggestions, feedback, or questions.

Until next time — The Code team