Anthropic recently published a randomized controlled trial showing developers using AI coding assistance scored 17% lower on comprehension tests than those coding manually, with productivity gains failing to reach statistical significance. A study of 52 junior engineers identified a stark divide: developers who used AI for conceptual questions scored 65% or higher, while those delegating code generation to AI scored below 40%.
A randomized controlled trial by Anthropic researchers examined how AI coding assistants affect skill development when learning new tools. Fifty-two mostly junior engineers with at least one year of weekly Python experience learned Trio, an asynchronous programming library unfamiliar to all participants. Both the control and AI-assisted groups completed two coding tasks followed by a quiz covering debugging, code reading, and conceptual understanding.
The AI group finished approximately two minutes faster, but the difference was not statistically significant. Quiz scores told a different story: the AI group averaged 50% compared to 67% for the manual coding group, with the largest gap in debugging questions.
In a Hacker News thread, siliconc0w captured the central tension:
You're trading learning and eroding competency for a productivity boost which isn't always there.
Another commenter, AstroBen, raised a generational concern:
I wonder if we're going to have a future where the juniors never gain the skills and experience to work well by themselves, and instead become entirely reliant on AI.
How developers interacted with AI determined outcomes more than whether they used it. Low-scoring patterns, averaging below 40%, included complete AI delegation for code generation, progressive reliance where developers gradually handed all work to AI, and iterative AI debugging where developers relied on AI to solve rather than clarify problems. High-scoring patterns, averaging 65% or higher, shared a common thread of cognitive engagement: asking follow-up questions after generating code, combining code generation with explanations, or using AI only for conceptual questions while coding independently. As Hacker News commenter AstroBen noted:
AI is incredibly useful as a personal tutor.
The pattern holds up in independent academic research. A 2024 peer-reviewed study by Jošt, Taneski, and Karakatič at the University of Maribor (Applied Sciences) ran a 10-week experiment with 32 undergraduate students learning React and found near-identical results: significant negative correlations between LLM use for code generation and debugging and final grades, while LLM use for explanations showed no significant negative impact. The authors concluded that this form of LLM use "might not hinder, and could potentially aid, student performance."
Medium contributor Tom Smykowski argued the Anthropic study measures learning new libraries specifically rather than general programming ability, writing that it shows:
Not how AI impacts programmers in general, but how AI use impacts learning things that are new to you.
Medium contributor Guru Prasad framed the core tension as cognitive engagement versus cognitive offloading rather than AI versus no AI.
The findings sit alongside Anthropic's earlier observational research showing AI can reduce task completion time by 80% for tasks where developers already have relevant skills. The researchers suggest AI may both accelerate productivity in established skills and hinder acquisition of new ones, though they acknowledge the study measured comprehension immediately after tasks rather than tracking longer-term skill development.
Anthropic recommends deploying AI tools with intentional design choices that support engineers' learning, noting that productivity benefits may come at the cost of the debugging and validation skills needed to oversee AI-generated code. Major LLM providers, including Anthropic and OpenAI, now offer dedicated learning modes designed to prioritize comprehension over delegation, including Claude Code's Learning and Explanatory mode and ChatGPT Study Mode.