Skip to main content
Original: Swyx · 02/03/2026

Summary

The article discusses the limitations of traditional code reviews in the age of AI, advocating for a shift towards reviewing specifications and intent rather than code itself.

Key Insights

“We cannot consume this much code.” — Discussing the overwhelming volume of code changes and the inefficiency of manual reviews.
“The answer is to move the human checkpoint upstream.” — Proposing a shift in focus from code review to reviewing specifications and intent.
“Trust is layered.” — Explaining the need for multiple verification layers in the AI-driven development process.

Topics


Full Article

Second wave speakers for AIE Europe and CFP for AIE Worlds Fair are announced today, and OpenCode is confirmed for Miami! Well also be in Melbourne & Singapore.Editor: This is the latest in our guest post program, where we will publish AI Engineering essays worth considering, even if we dont personally agree with them having just shipped an AI review tool, this is one of those cases where I am not there yet, but is clearly on the horizon, and am happy for Ankit to argue the case!Humans already couldnt keep up with code review when humans wrote code at human speed. Every engineering org Ive talked to has the same dirty secret: PRs sitting for days, rubber-stamp approvals, and reviewers skimming 500-line diffs because they have their own work to do.We tell ourselves it is a quality gate, but teams have shipped without line-by-line review for decades. Code review wasnt even ubiquitous until around 2012-2014, one veteran engineer told me, there just arent enough of us around to remember.And even with reviews, things break. We have learned to build systems that handle failure because we accepted that review alone wasnt enough. This shows in terms of feature flags, rollouts, and instant rollbacks.We have to give up on reading all the codeTeams with high AI adoption complete 21% more tasks and merge 98% more pull requests, but PR review time increases 91%, based on data from over 10,000 developers across 1,255 teams.Two things are scaling exponentially: the number of changes and the size of changes. We cannot consume this much code. Period. On top of that, developers keep saying that reviewing AI-generated code requires more effort than reviewing code written by their colleagues. Teams produce more code, then spend more time reviewing it.There is no way we win this fight with manual code reviews. Code review is a historical approval gate that no longer matches the shape of the work.AI code review is still reviewAI code review tools are just buying us time. If AI writes the code and AI reviews it, why do we need a pretty review UI to display that? As much as AI code reviews can be valuable, these will shift left in the dev cycle. Theres no reason to waste CI resources and manage versioning in between the review cycles.Post-PR review made sense when humans wrote code and needed fresh eyes. When agents write code, fresh eyes is just another agent with the same blind spots. The value is in the iteration loop here, not as an approval gate.We know from experience agents are not always reliable, and its very human to think, I caught the AI doing something dumb once; therefore, I must always check it. That instinct made sense when manual verification was feasible. At the current scale, its not anymore. And its just going to get worse.From reviewing Code to reviewing IntentThe answer is to move the human checkpoint upstream. If the thought of not reviewing code seems scary, let me remind you that checkpoints have moved before in software development. We moved from waterfall sign-offs to continuous integration. We can move them again.Spec-driven development is becoming the main way of working with AI. Humans should review specs, plans, constraints, and acceptance criterianot 500-line diffs.In this new paradigm, Specs become the source of truth. Code becomes an artifact of the spec. You dont need to review the code. You review the steps. You review the verification rules. You review the contract the code must fulfill.Human-in-the-loop approval moves from Did you write this correctly? to Are we solving the right problem with the right constraints? The most valuable human judgment is exercised before the first line of code is generated, not after.Building trust through layersHow comfortable do we need to get before we stop reading the code?In rule form:Code must not be written by humansCode must not be reviewed by humansLLMs are not great at following commands. They deviate. Frequently. And theyre unreliable at self-verificationtheyll confidently tell you the code works while its on fire. The fix isnt to ask the LLM to verify. Its to ask it to write a script that verifies. Shift from judgment to artifact.Trust is layered. This is the Swiss-cheese model: no single gate catches everything. You stack imperfect filters until the holes dont align. So, where else can we put approval gates?Layer 1: Compare Multiple OptionsInstead of asking one agent to get it right, ask three agents to try differently and pick the best outcome. Let them compete. The cost of optionality is the lowest in the history of software engineering.The selection doesnt have to be manual either. You can rank outputs by which one passes the most verification steps, which one produces the smallest diff, which one doesnt introduce new dependencies. Competition creates a signal you wouldnt get from a single attempt.Layer 2: Deterministic GuardrailsThere should be a deterministic way to verify the work. Tests, type checks, contract verification - things that dont have opinions, just facts.Instead of asking an LLM Did this work? you define verification steps that produce a series of pass/fail artifacts. The agent cant negotiate with a failing test. It either meets the specification or it doesnt.These guardrails can be defined as layers themselves:Coding guidelines - these can be custom lintersOrganization-wide invariants - the non-negotiables, e.g. No hardcoded credentials, API keys, or tokensDomain Contracts - specific to a framework, a service, or a part of the code base, e.g. Payments domain: All amounts use the Money typeAcceptance Criteria - specific to the taskVerification steps should be defined before the code is written, not invented after to confirm whats already there. If the agent writes both the code and the tests, youve just moved the problemnow youre trusting the agent to test the right things. Verification criteria need to come from the spec, not from the implementation.Layer 3: Humans define acceptance criteriaSo where do humans add value? Upstream, defining what success looks like.This is where Behavior-Driven Development becomes newly relevant. BDD was always a good ideawrite specifications in natural language that describe expected behavior, then automate those specs as tests. But it never fully caught on because writing specs felt like extra work when you were also going to write the code.With agents, the equation flips. The spec isnt extra work; its the primary artifact. You write:The agent implements. The BDD framework verifies. You never have to read the implementation unless something fails.This is humans doing what humans are good at: defining what correct means, encoding business logic and edge cases, thinking about what could go wrong. The agent handles the translation from intent to code. The BDD specs become your verification layerdeterministic, automated, and defined before the first line is written.Acceptance criteria authored by humans, verified by machines. Thats the gate that actually matters.Layer 4: Permission Systems as ArchitectureWhat can this agent touch? What requires escalation? These become architectural decisions, not afterthoughts.Most agent frameworks treat permissions as an all-or-nothing setting. The agent either has shell access or it doesnt. But granularity matters. An agent fixing a bug in a utility function doesnt need access to your infrastructure configs. An agent writing tests doesnt need to modify CI pipelines.Scope should be as narrow as possible while still letting the agent do useful work. If the task is fix the date parsing bug in utils/dates.py, the agents filesystem access should be limited to that file and its test file. Not the whole codebase. Not src/ and tests/. Just the files that matter for this task.Escalation triggers are equally important. Certain patternstouching auth logic, modifying database schemas, adding new dependenciesshould automatically flag for human review regardless of how confident the agent is.Layer 5: Adversarial VerificationSeparation of responsibilities: One agent does the work, another verifies. They dont trust each other, and thats the point.This is an old patternits why your QA team shouldnt report to your engineering manager, and why the person who writes the code shouldnt be the only one who reviews it.With agents, you can enforce this architecturally. The coding agent has no knowledge of what the verification agent will check. The verification agent has no ability to modify the code to make its own job easier. Theyre adversarial by design.You can take this further: a third agent attempts to break what the first agent built, specifically targeting edge cases and failure modes. Red team, blue teambut automated and running on every change.Conclusions: What good code looks like is changingThe incentive of an agentic system is simple: given a task, can I complete it? Can I please the person who gave it to me? The agents success is never inherently driven by long-term accuracy or business requirements.Its our job to encode in the constraints.For code generated by agents and read by agents, what good code looks like will become more standardized. For a new codebase, youll have to provide less direction because the defaults will be more consistent.The future is ship fast, observe everything, revert faster.Not: review slowly, miss bugs anyway, debug in production.Were not going to outread the machines. We need to outthink themupstream, where the decisions actually matter.Ultimately, if agents can handle the code just fine, what does it matter if we can read it or not?Ankit Jain is the founder and CEO of Aviator, where hes building the infrastructure for AI-native engineering teams. Aviators platform helps modern organizations improve AI adoption while maintaining high engineering standards.

I Stopped Reading Code. My Code Reviews Got Better.

Dan Shipper (Every) · explanation · 76% similar

[AINews] Autoresearch: Sparks of Recursive Self Improvement

Swyx · explanation · 71% similar

AI should help us produce better code - Agentic Engineering Patterns

Simon Willison · how-to · 68% similar