Alibaba launches trillion-parameter model, AI malware hits 2,180 GitHub accounts

Every programmer and technical leader has the same problem.

Our careers depend on how current our skills are, but we're too busy shipping code to keep them current. From ICs shipping features to CTOs making build-vs-buy decisions that impact millions in budget, no productive programmer or technical leader has 5 hours a day to read through every Hacker News post, tweet, subreddit, and arXiv paper to find the stuff that really matters.

Welcome to The Code. This is a short, once-a-week email that cuts through the noise to help you find high-signal news, releases, and resources in 10 minutes or less.

This email is written by developers and technical leaders who have built and exited startups built on frontier technologies and worked at Fortune 1000 companies. This email contains all the weekly insights our team is using to level up and ship better code and products.

P.S. This is our first issue. And we’re writing it just for you! We want to curate and create insights that help you level up. Tell us what you think and how we can help you by replying to this email or responding to the poll at the bottom of this newsletter.

THIS WEEK IN PROGRAMMING

Alibaba released Qwen3-Max-Preview with 1 trillion parameters. Source: Alibaba

1. Alibaba launches trillion-parameter Qwen3 model, also shares its playbook to cut outages and cloud costs: The Chinese tech giant released Qwen3-Max-Preview, its largest model yet, which packs over 1 trillion parameters with a 262K context window, surpassing Claude Opus 4 and DeepSeek-V3.1 across SuperGPQA, AIME25, and coding tests. Alibaba also released a new paper where it claims it reduced network outages by 92% by offloading workloads to idle infrastructure.

2. AI-powered malware hits 2,180 GitHub accounts in “s1ngularity” attack: Attackers compromised popular Nx build system packages on NPM, deploying malware that hijacked AI CLI tools like Claude, Gemini, and Q to systematically harvest GitHub tokens, SSH keys, and cryptocurrency wallets from developer machines. The three-phase attack exposed 7,200 repositories across 2,180 accounts, with stolen credentials uploaded to the public "s1ngularity-repository".

3. OpenRouter launches two models with 2-million token context windows: OpenRouter’s Sonoma Sky Alpha and Sonoma Dusk Alpha have 2-million token context windows and free access during alpha testing. Both frontier models support image inputs and parallel tool calling, with Sky Alpha positioned as "maximally intelligent" and Dusk Alpha as "fast and intelligent" for general-purpose tasks. They help developers explore ultra-long context capabilities without cost barriers.

4. Google releases EmbeddingGemma, a 308M parameter embedding model: Google's new embedding model features a bi-directional attention architecture and a 2K context window, supports over 100 languages, and ranks highest among sub-500M models on MTEB benchmarks. Developers can use this guide for practical implementation examples across frameworks like Sentence Transformers, LangChain, and Transformers.js.

TRENDS & INSIGHTS

What Engineering Leaders Need to Know This Week

1. A Field Guide to Rapidly Improving AI Products: This detailed guide explains real-world ML challenges beyond textbook theory, covering everything from data pipelines to production monitoring. The author shares hard-earned lessons from deploying models at scale, helping readers navigate common pitfalls like data drift and model versioning.

2. How I Cut AWS Compute Costs by 70% with a Multi-Arch EKS Cluster and Karpenter: An engineer discovered Karpenter's intelligent node provisioning can cut compute costs by 70% while making scaling nearly instantaneous. The multi-architecture approach delivered better performance per dollar with some workloads.

3. OpenAI's 5-step plan for leadership teams to stay ahead in AI: OpenAI's guide breaks down how to align teams, activate learning, amplify wins, accelerate decisions, and govern responsibly to build AI-first organizations.

4. How LinkedIn runs 50+ AI features using vLLMs: LinkedIn adopted the open-source vLLM library to power everything from job search to hiring tools. The system handles thousands of servers with smart memory tricks that make AI responses faster and cheaper for users’ LinkedIn experience.

IN THE KNOW

What’s trending on socials and headlines today

📖 Book Worm: A senior engineer at Google just dropped a 400-page free book on Agentic Design Patterns.

👀 My Way Or The Highway: Coinbase CEO explains why he fired engineers who didn’t try AI immediately.

🧑‍💻 Code King: Andrej Karpathy praises GPT-5 Pro's superior problem-solving capabilities after it solved complex coding challenges that stumped Claude for hours.

🏢 Job Guide: Research Engineer Aleksa Gordić shares how to get a job in Google DeepMind without a Machine Learning Degree.

DEEP DIVE

How Companies Are Using Multi-Agent Orchestration

Source: The Code, Superhuman AI

Multi-agent orchestration coordinates specialized AI agents that work together, with a central coordinator managing collaboration to solve complex problems. For example, when applied to customer service, one agent handles initial queries while specialized agents manage technical support, billing, or product questions.

Key Features:

Intelligent routing — Classifier analyzes requests and selects the most suitable agent based on task requirements
Specialized agents — Each agent is optimized for specific domains (coding, analysis, creative writing, data processing)
Persistent memory — Maintains conversation history between agents, ensuring context preservation
Dynamic team composition — Lead agent activates different team members as needed
Bidirectional communication — Agents share information with each other and the coordinator

How It Works:

Classifier evaluates user requests and identifies needed specialized agents
Lead agent acts as the project manager, receiving classified requests and planning the workflow
Team agents process assigned portions independently based on requirements
Results flow through the lead agent, which synthesizes outputs into unified responses
Two memory systems track interactions — one for user conversations, another for team communications

How to Build It:

OpenAI Agents SDK: OpenAI’s framework for building coordinated agent systems
AWS’s Agent Squad: Flexible framework for managing multiple AI agents and handling complex conversations
Semantic Kernel Multi-Agent: Microsoft's approach to orchestrating multiple AI agents
LangChain Multi-Agent Workflows: Tutorial for agent collaboration and workflow management on LangChain

TOP & TRENDING RESOURCES

3 Tutorials to Level Up Your Skills

Click here to watch Claude Code best practices by Anthropic. Source: Anthropic

Claude Code Best Practices: Anthropic explains Claude Code as a terminal-expert coding agent that explores codebases organically without indexing. Key practices include using claude.md files for context, smart permission management, planning workflows, and leveraging CLI integrations.
How to build with Nano Banana: Google’s official guide provides a comprehensive walkthrough for developers looking to integrate Gemini 2.5 Flash Image (Nano Banana) into their apps using the Gemini Developer API.
Ship AI Products with Confidence: This free 17-email course introduces concepts from the Analyze-Measure-Improve lifecycle. It’s a framework for building and maintaining high-quality LLM applications, based on questions from over 2,000 engineers and PMs from companies including Google, OpenAI, and Meta.

Top Repos

Agentic AI Crash Course: Structured repo to learn agentic AI
Awesome Context Engineering: Guide on evolution from static prompting to dynamic, context-aware AI systems.
The Algorithms: Compilation of all algorithms implemented in Python

What did you think of today's newsletter?

Your feedback helps us create better emails for you!

Until next time — The Code team