6-month-old startup beats Gemini

Welcome back. Nobody thought this was possible. 6-month-old startup Poetiq built a meta-system that learns to guide existing models to solve problems they previously couldn't — and outscored Google on the biggest AGI benchmark for half the cost along the way.

Also: How to prompt Claude Code 10x better, how to get the most out of Claude Opus 4.5 and how to get more done with fewer engineers.

Today’s Insights

New models and features for devs
8 strategies for boosting engineering pace
How to write a better Claude.md file
Trending social posts, top repos, new research & more

Welcome to The Code. This is a 2x weekly email that cuts through the noise to help devs, engineers, and technical leaders find high-signal news, releases, and resources in 5 minutes or less. You can sign up or share this email here.

TODAY IN TECH

Poetiq outscores Gemini 3 Deep Think. Source: Poetiq

Tiny startup outpaces Google on key reasoning benchmark: Poetiq, a six-person team, reportedly outscored Google's Gemini 3 Deep Think on the ARC-AGI-2 reasoning benchmark. The company achieved 54% accuracy at $30 per task using a meta-system that utilizes learned test time reasoning to optimize solutions without fine-tuning the underlying models. Its approach works by orchestrating existing frontier models, demonstrating that specific architectural innovation, rather than just massive compute budgets, can drive progress in AI reasoning

Alibaba drops major Qwen3-TTS upgrade with 49+ voice options: The Chinese e-commerce giant just gave developers a lot more to work with. The latest Qwen3-TTS update features 49+ timbres spanning genders, ages, and character roles, plus support for 10 languages and 9 Chinese dialects. In benchmarks, it posts lower word-error rates than MiniMax, ElevenLabs, and GPT-4o-Audio-Preview. Improved prosody means speech finally sounds natural. You can try the demo here.

Microsoft drops VibeVoice, a TTS system built for podcasts: Long-form, multi-speaker audio usually breaks after a few minutes as voices drift and tones collapse. VibeVoice addresses this issue with continuous speech tokenizers running at 7.5 Hz, maintaining the stability of four speakers for up to 90 minutes. Its 0.5B model delivers real-time output in approximately 300 ms. The system is now available on GitHub and Hugging Face.

TRENDS & INSIGHTS

What Engineering Leaders Need to Know This Week

Click here to watch how AFAS runs with only 70 engineers

How to get more done with fewer engineers: What if you could build a multi-million dollar software company where only 10% of your employees are developers? AFAS, a company with hundreds of millions in revenue, does exactly that with a lean team of just 70 engineers. In this podcast, Engineering Manager Michiel Overeem pulls back the curtain on their unconventional strategies for achieving massive productivity with a surprisingly small team.

8 strategies for high software engineering pace: Just working harder isn’t going to deliver software faster. Engineering consultant Jim Grey's new blog series breaks down eight actionable strategies for boosting engineering pace, including AI adoption, ruthless process optimization, and smarter approaches to technical debt. It's a must-read for anyone managing dev teams under pressure to deliver more.

What actually makes you senior: Architecture, communication, ownership — we've heard it all. But this post makes the case that senior engineers stand out by reducing ambiguity, turning vague requirements into actionable plans. It's a useful lens for engineering leaders rethinking how they spot (and grow) top talent.

IN THE KNOW

What’s trending on socials and headlines

Meme of the week

Claude Tips: Claude’s team revealed 5 techniques to get better results for Opus 4.5, backed by extensive internal testing.
Reframe This: Karpathy's viral post explains why you should treat LLMs as simulators, not entities. Ask them to channel specific viewpoints instead of asking for their "opinion."
Opus Playbook: This viral post (1.3M views) breaks down how to get the most out of Opus 4.5—from using plan mode to building agent swarms.
Design Skills: Learn how you can create stunning websites with AI from start to finish, prompts included.

OpenAI and Instacart launch an in-chat app for meal planning and grocery orders in ChatGPT, with built-in Instant Checkout powered by Stripe payments.
Devs can now delegate tasks to Claude Code directly from Slack.
Google's NotebookLM expands mobile functionality with visual uploads, slide support, and synced audio progress.
Anthropic surveys engineers and analyzes Claude Code sessions to understand AI’s effects on internal workflows.

TOP & TRENDING RESOURCES

3 Tutorials to Level Up Your Skills

Click here to watch how to prompt Claude Code

Anthropic reveals how to prompt Claude Code 10x better: In this tutorial, devs can learn specific strategies to prompt Claude 4.5 models and Claude Code. It also explains how to use clear directives instead of aggressive commands, constrain requests to prevent over-engineering, and contextualize rules.

Best practices for writing CLAUDE.md files: In this tutorial, devs learn how to get better results from Claude Code by keeping instruction files under 300 lines and using "progressive disclosure," storing task-specific details in separate markdown files that Claude reads only when needed.

Building coding agents with tool execution: In this course, you’ll build agents that write and execute code to accomplish tasks, going far beyond predefined function calls. Instead of limiting your agent to a fixed set of tools, you’ll let it access Python’s entire ecosystem, and write multi-step code sequences.

Top Repos

All-agentic-architectures: This repo contains implementations of 17+ agentic architectures designed for practical use across different stages of AI system development.
AI-engineering-hub: This repo contains in-depth tutorials on LLMs, RAGs and real-world AI agent applications.
Claude-quickstarts: A collection of projects designed to help developers quickly get started with building deployable applications using the Claude API.

Trending Papers

Measuring agents in production: This paper discusses the reality of deploying AI agents, addressing critical reliability challenges. It finds that production systems succeed through deliberate simplicity and constrained workflows rather than the complex, open-ended autonomy often assumed in hype cycles.

The missing layer of AGI: This paper discusses the "coordination layer" required to transform LLM pattern matching into reliable AGI , formalizing reasoning as a phase transition via UCCT (Unified Contextual Control Theory) and implementing it through the MACI multi-agent architecture.

A comprehensive survey on integrating large language models with knowledge-based methods: This paper discusses how integrating LLMs with knowledge bases addresses limitations like hallucinations and outdated information. It finds that combining techniques like RAG and Knowledge Graphs significantly enhances model accuracy, reasoning, and reliability.

Whenever you’re ready to take the next step

What did you think of today's newsletter?

Your feedback helps us create better emails for you!

You can also reply directly to this email if you have suggestions, feedback, or questions.

Until next time — The Code team