Skip to main content
Original: Lenny Rachitsky · 11/02/2026

Summary

I compare GPT-5.3 Codex with Opus 4.6 (and Opus 4.6 Fast) by asking them to redesign my marketing website and refactor some genuinely gnarly components. I put the newest AI coding models from OpenAI and Anthropic head-to-head, testing them on real engineering work I’m actually doing. I compare GPT-5.3 Codex with Opus 4.6 (and Opus 4.6 Fast) by asking them to redesign my marketing website and refactor some genuinely gnarly components. Through side-by

Key Insights

“I compare GPT-5.3 Codex with Opus 4.6 (and Opus 4.6 Fast) by asking them to redesign my marketing website and refactor some genuinely gnarly components.” — Introduction to the AI models being compared and the tasks they were assigned.
“Why Codex excels at code review but struggles with creative, greenfield work.” — Discussion on Codex’s strengths and weaknesses.
“The surprising way Opus and Codex complement each other in a real-world engineering workflow.” — Insight into how different AI models can be used together effectively.

Topics


Full Article

I put the newest AI coding models from OpenAI and Anthropic head-to-head, testing them on real engineering work I’m actually doing. I compare GPT-5.3 Codex with Opus 4.6 (and Opus 4.6 Fast) by asking them to redesign my marketing website and refactor some genuinely gnarly components. Through side-by-side experiments, I break down where each model shines—creative development versus code review—and share how I’m thinking about combining them to build a more effective AI engineering stack. Listen on YouTube, Spotify, or Apple Podcasts

What you’ll learn:

  1. The strengths and weaknesses of OpenAI’s Codex vs. Anthropic’s Opus for different coding tasks
  2. How I shipped 44 PRs containing 98 commits across 1,088 files in just five days using these models
  3. Why Codex excels at code review but struggles with creative, greenfield work
  4. The surprising way Opus and Codex complement each other in a real-world engineering workflow
  5. How to use Git concepts like work trees to maximize productivity with AI coding assistants
  6. Why Opus 4.6 Fast might be worth the 6x price increase (but be careful with your token budget)

[![](https://substackcdn.com/image/fetch/$s_!Y0wp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57265ea-6b92-4839-9fad-955065be7fe0_1340x396.png)](https://substackcdn.com/image/fetch/$s_!Y0wp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc57265ea-6b92-4839-9fad-955065be7fe0_1340x396.png)

Brought to you by:

WorkOS—Make your app enterprise-ready today

In this episode, we cover:

(00:00) Introduction to new AI coding models (02:13) My test methodology for comparing models (03:30) Codex’s unique features: Git primitives, skills, and automations (09:05) Testing GPT-5.2 Codex on a website redesign task (10:40) Challenges with Codex’s literal interpretation of prompts (15:00) Comparing the before and after with Codex (16:23) Testing Opus 4.6 on the same website redesign task (20:56) Comparing the visual results of both models (21:30) Real-world engineering impact: 44 PRs in five days (23:03) Refactoring components with Opus 4.6 (24:30) Using Codex for code review and architectural analysis (26:55) Cost considerations for Opus 4.6 Fast (28:52) Conclusion

Tools referenced:

• OpenAI’s GPT-5.3 Codex: • Anthropic’s Claude Opus 4.6: • Cursor: • GitHub:

Other references:

• Tailwind CSS: • Git: • Bugbot:

Where to find Claire Vo:

ChatPRD: 

Website: 

LinkedIn: 

X: 
Production and marketing by . For inquiries about sponsoring the podcast, email jordan@penname.co.

🎙️ This week on How I AI: Opus vs. Codex showdown, and AI for accessibility

Lenny Rachitsky · explanation · 85% similar

How to build your own AI developer tools with Claude Code | CJ Hess (Tenex)

Lenny Rachitsky · how-to · 83% similar

Head of Claude Code: What happens after coding is solved | Boris Cherny

Lenny Rachitsky · explanation · 80% similar