The Effectiveness of AI Agents Is Becoming an Almost Religous Debate

2026-04-02 ~600 words

The leak of Claude Code’s source code yesterday provided a lot of opportunities to dunk on the bad code produced by AI agents.

You can find lots of takes like this one in which someone identifies seriously janky code in the leak. Anthropic makes no secret that they use CC to work on its source code, so chances are this code was written by an agent. Ergo, all agent-produced code is shit.

A stronger claim, that agents can’t produce anything useful, is also common in some corners of the internet. People say they’ve used the agents and found the results laughable, and they’ll never come back. Due to the nature of probabilistic output, nothing output by an LLM can ever be trusted.

Meanwhile, a whole other sphere of the online world, especially on tech podcasts and VC blogs, is proclaiming that software engineering is dead. Agents can build a project from start to finish, without any human intervention beyond the specification of goals. And maybe the agents can even come up with the goals!

These are both straight-up wrong, disprovable with ten minutes of engagement on the topic. AI code is, if anything, a lot nicer than human code ¹ and we already have hundreds of useful tools built largely by agents. Just to name one: Monty is a reimplementation of the core parts of the Python language but written in Rust and sandboxed such that it presents fewer security risks. The primary developer spoke on a podcast about how it never would have been possible without agentic coding.

And yet AI agents haven’t obviated software developers yet. While they’re very good at solving problems with verifiable solutions - usually by testing their own code - they struggle when unable to iterate toward a solution, and they make plenty of errors in judgment. They even hallucinate sometimes, though this has gotten less and less frequent as models and harnesses have improved. They struggle to know when to “step back” when they’ve gone too deep into debugging and would benefit from reconsidering the information they’ve gathered.

The extreme optimists and extreme pessimists are both wrong, at least for now.

But the pessimists are clearly more wrong. Even without skillful prompting or configuration, agents can already do lots of very valuable coding tasks. They can track down the likely sources of a bug in a complex, multi-repository data pipeline. They can implement whole features with just light feedback from a human. They can analyze a codebase and identify places where documentation has fallen out-of-step with the code it accompanies².

Even the biggest skeptics, if they seriously tried to use the tools for a week, would be convinced of their value at least in some specific areas. I’ve actually watched this happen to some people I know.

What I don’t want – but is already happening – is for people to get so dug in that they reject any evidence that doesn’t fit their priors. The pro-AI zealots are at least using the technology enough that they come face-to-face with its limitations sometimes, but I’m afraid that the anti-AI crowd is becoming so invested and so disengaged that nothing can convince them.

Tagged: AI, Programming, Tech

Because human code in real project is actually very sloppy and full of bugs, as any professional developer can tell you. ↩︎
Yes, I personally have successfully used agents to do each of the above tasks. ↩︎