
Welcome back. If you’re overwhelmed with talk of AI agents being the next big thing but don’t know where to start — we’ve got your back. Check out our simple and detailed guide to building AI agents here.
Today: Google launches API for coding automation, choose the right AI-powered IDE for your work, and learn how to set up AI evals from scratch.
Today’s Insights
Perplexity’s Comet browser can compromise your data
AI taxonomy guide, and how to adopt AI on your team
LLM eval guide, building agents in Python, and more
Trending social posts, top repos, new research & more
Welcome to The Code. This is a 2x weekly email that cuts through the noise to help devs, engineers, and technical leaders find high-signal news, releases, and resources in 5 minutes or less. You can sign up or share this email here.

THIS WEEK IN PROGRAMMING
Perplexity's Comet browser vulnerable to one-click data theft: A new attack called CometJacking can turn Perplexity's viral new AI browser into a data exfiltration tool with just one malicious link. Security researchers at LayerX demonstrated how embedded prompts in URLs can instruct Comet's AI to extract emails, calendar events, and other sensitive information from authorized services without the user's knowledge.
Google launches Jules API for programmatic coding automation: The search giant just released an API and CLI tools for Jules, its AI coding agent, letting developers automate tasks across the entire software development lifecycle. The new API enables custom integrations like fixing bugs directly from Slack channels, automating backlog triage, and connecting Jules to CI/CD pipelines in GitHub Actions.
IBM makes agentic AI deployments radically cheaper: IBM just released Granite 4.0, a family of enterprise models built specifically for the resource-hungry demands of agentic workflows. The hybrid architecture slashes memory requirements by over 70% when handling long contexts and multiple concurrent sessions—exactly what multi-agent systems need.

TRENDS & INSIGHTS
What Engineering Leaders Need to Know This Week

Source: The Code, Superhuman
Complete AI Testing Taxonomy Guide: The framework covers 7 categories of AI testing — everything from model performance and functional testing to chaos testing and security — with specific metrics, timing, and example tests for each.
Scaling Engineering Teams: Lessons from Google, Facebook, and Netflix: Drawing from over a decade at Google, Facebook, and Netflix, engineering leader Ido Green just published a guide to scaling teams without losing your best people. The hard truth: manual processes and ad-hoc communication collapse at scale, while boring stuff like documentation and automation saves weekends.
The EM’s guide to AI adoption (without your engineers hating it): A new guide breaks down how engineering teams can successfully adopt AI tools without frustrating developers. Key insight: teams that succeed focus on proper context setup—creating specific rules files and configuring MCP servers so AI understands their codebase.
Choosing the Right AI IDE for Your Team: Dive into a hands-on comparison of Cursor, Windsurf, and Copilot with GPT-5, highlighting their strengths in greenfield and brownfield projects.

IN THE KNOW
What’s trending on socials and headlines

Meme of the day
Developer Superpowers: An engineer claims they achieved 5x higher throughput than their colleagues with AI with this workflow.
Secret Agent: OpenAI is reportedly set to release a new agent builder platform today
Codex vs.Sonnet: Developers debate on best coding tool — See who wins.
Robust Systems: This article explains everything you need to know about System Design
React 19.2 is now available.
Tencent’s new model has taken the #1 spot in LMArena, ranked as both the top overall and top open-source Text-to-Image model.
Replit launched Connectors — allowing devs to build apps and automations that integrate seamlessly with the tools
Qwen’s 3B parameter model is rivaling GPT-5-Mini & Claude4-Sonnet — and often beating them.

TOP & TRENDING RESOURCES
3 Tutorials to Level Up Your Skills

Source: Sebastian Raschka
Understanding the 4 Main Approaches to LLM Evaluation (From Scratch): Anthropic researcher Sebastian Raschka just published an in-depth breakdown of how AI labs actually measure whether their models are improving. The 40-page technical guide covers everything from multiple-choice tests (like MMLU) to having AI models judge each other's responses.
Building AI Agents from Scratch in Python: This practical tutorial walks developers through building a functional AI agent using only Python and direct LLM API calls—no frameworks required. The approach emphasizes understanding fundamentals before adopting framework abstractions.
Redis 101 — From a Beginner’s POV: This comprehensive technical guide demystifies Redis architecture, from basic single-instance setups to advanced cluster configurations. Perfect for developers looking to implement caching or build high-availability systems.
Top Repos
agents: 6 week journey to code and deploy AI agents with OpenAI Agents SDK, CrewAI, LangGraph, AutoGen and MCP.
LLaMA-Factory: Easily fine-tune 100+ large language models with zero-code CLI and Web UI.
ml-engineering: This is an open collection of methodologies, tools and step-by-step instructions to help with successful training and fine-tuning of large language models and multi-modal models and their inference.
Trending Papers
A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning: This study systematically analyzes multi-turn RL for LLM agents across environment, policy, and reward design. Findings show cross-task generalization, biased algorithms outperform unbiased ones, and balancing demonstrations with RL maximizes performance under budgets.
Advancing theoretical computer science with AlphaEvolve: Google's AlphaEvolve uses LLMs to discover optimized mathematical structures for complexity theory proofs. It improved MAX-4-CUT approximation bounds and found 163-node Ramanujan graphs, enabling AI-verified advances in theoretical computer science research.
Weight Transfer for RL Post-Training in under 2 seconds: Perplexity researchers achieve 1.3-second weight transfers for trillion-parameter models during RL training using RDMA point-to-point communication. The system enables ultra-fast GPU synchronization through one-sided transfers, static scheduling, and pipelined execution, making large-scale reinforcement learning fine-tuning more efficient for developers.
Whenever you’re ready to take the next step
What did you think of today's newsletter?
You can also reply directly to this email if you have suggestions, feedback, or questions.
Until next time — The Code team