{"id":835,"date":"2026-03-02T08:00:00","date_gmt":"2026-03-02T08:00:00","guid":{"rendered":"https:\/\/www.loadsys.com\/?p=835"},"modified":"2026-03-02T15:11:15","modified_gmt":"2026-03-02T15:11:15","slug":"coding-agent-completion-proof","status":"publish","type":"post","link":"https:\/\/www.loadsys.com\/blog\/coding-agent-completion-proof\/","title":{"rendered":"Your Coding Agent Is Lying to You About Completion. Here&#8217;s the Proof."},"content":{"rendered":"\n<p>Your coding agent is lying to you about completion. Not maliciously. Not even technically incorrectly, in its own context window, the work does look done. But when a structured verification agent reads the actual files against a detailed specification, the story changes.<\/p>\n\n\n\n<p>On a recent application build, every time the coding agent reported a phase complete, the verification agent found 30\u201340% of the work was not actually done. Not broken. Not wrong. Simply absent. And the coding agent had no idea.<\/p>\n\n\n\n<p>This happened across nearly 1,000 verification check items. It took 5\u20136 verification-and-fix iterations to reach 100%. The total human time on the entire engagement, planning through final verification, was 24 hours.<\/p>\n\n\n\n<p>Here&#8217;s what that means for teams running AI coding agents today.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">The Completion Illusion<\/h2>\n\n\n\n<p>There&#8217;s a specific failure mode that nobody in the AI development tooling conversation is talking about honestly.<\/p>\n\n\n\n<p>Coding agents are very good at generating code. They&#8217;re much less reliable at knowing when they&#8217;re done. The agent&#8217;s context window has a horizon \u2014 it knows what it built in this session, in this conversation, against the prompt it was given. It doesn&#8217;t have a persistent, structured picture of everything the specification required.<\/p>\n\n\n\n<p>So it reports complete. Confidently, with good reason from its own perspective.<\/p>\n\n\n\n<p>And 60\u201370% of the spec is implemented.<\/p>\n\n\n\n<p>This isn&#8217;t a corner case. In this build, across multiple verification passes covering nearly 1,000 check items \u2014 data models, API integrations, UI components, payment flows, route guards, real-time subscriptions, test files \u2014 the pattern held consistently. Every &#8220;complete&#8221; declaration from the coding agent was followed by a verification pass that found roughly a third of the work still missing.<\/p>\n\n\n\n<p>To be clear: the code that was written was good. The agent built what it said it built. The problem is everything it didn&#8217;t mention, the features specified in the plan that simply weren&#8217;t there yet.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">What the Verification Layer Actually Looks Like<\/h2>\n\n\n\n<p>This build used a structured verification system with close to 1,000 check items across multiple phases of the project \u2014 organisms, pages, data hooks, API integrations, route guards, payment flows, authentication patterns, test coverage, real-time subscriptions, accessibility.<\/p>\n\n\n\n<p>Each check item had:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>A specific thing to verify (not &#8220;does auth work&#8221; but &#8220;does the ProtectedRoute wrapper appear at line X of App.tsx&#8221;)<\/li>\n\n\n\n<li>Expected evidence (the exact component, prop, or function call that would confirm implementation)<\/li>\n\n\n\n<li>Pass\/fail status with the actual evidence found (or noted as absent)<\/li>\n<\/ul>\n\n\n\n<p>When the coding agent declared a phase complete, the verification agent ran through the full checklist against the live codebase. It didn&#8217;t ask the coding agent what it had built. It read the files.<\/p>\n\n\n\n<p>The results were consistent across every phase: the coding agent had implemented roughly 30\u201340% of what the specification required. The verification report was handed back. The coding agent fixed the gaps. Another verification pass. More gaps. This cycled 5\u20136 times before a full pass.<\/p>\n\n\n\n<p>What did those gaps look like in practice?<\/p>\n\n\n\n<p>A complete registration wizard with four steps \u2014 except Step 4 (payment: Stripe + offline selection) was missing entirely. The UI flowed smoothly to a blank screen.<\/p>\n\n\n\n<p>Five data hooks written and exported correctly \u2014 but still calling <code>setTimeout<\/code> with mock data instead of the real AppSync GraphQL client. The app looked functional in every environment. It wasn&#8217;t connected to anything.<\/p>\n\n\n\n<p>A waitlist feature fully specified in the planning documents \u2014 with status display, position tracking, countdown timer, claim window \u2014 not present at all. Not broken. Just absent.<\/p>\n\n\n\n<p>Route guards protecting dashboard pages \u2014 present on most routes, missing on three. You could navigate directly to admin pages without authentication.<\/p>\n\n\n\n<p>None of these were detectable by looking at the app. They required checking the files against the spec.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">The Planning Layer: What You&#8217;re Verifying Against<\/h2>\n\n\n\n<p>For verification to work, you need something to verify against. That&#8217;s the other half of this story.<\/p>\n\n\n\n<p>Before a single line of code was written on this build, the project went through five phases of structured AI planning: scope, requirements, architecture, data design, API design, frontend patterns, infrastructure, CI\/CD, testing strategy, and roadmap. Eleven documents, cross-referenced and internally consistent.<\/p>\n\n\n\n<p>Then a structured review pass \u2014 three parallel agents covering scope, architecture, and roadmap simultaneously \u2014 flagged 77 findings. Eleven were critical.<\/p>\n\n\n\n<p>The wrong database technology was documented (PostgreSQL vs DynamoDB). The wrong API paradigm was specified in scope (REST vs GraphQL, contradicting the architecture document). A Step Functions workflow type was chosen that doesn&#8217;t support the callback pattern the architecture required. COPPA compliance \u2014 mandatory for a platform serving minors \u2014 was entirely absent from the specification.<\/p>\n\n\n\n<p>These are the findings that, caught during build, cost $15,000\u2013$40,000 each. Caught in planning, they cost an update to a document.<\/p>\n\n\n\n<p>The eleven critical findings and twenty-two major findings were resolved before implementation began. The resulting planning suite became the specification the verification agent ran against across every subsequent phase.<\/p>\n\n\n\n<p>That&#8217;s the loop: plans precise enough to verify against, verification rigorous enough to catch what the coding agent missed, iteration fast enough to close the gap before it becomes technical debt.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">The Numbers<\/h2>\n\n\n\n<p>Let&#8217;s look at what this actually cost \u2014 and what it would have cost without it.<\/p>\n\n\n\n<p><strong>Total investment:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Brunel platform: ~$300<\/li>\n\n\n\n<li>Human oversight across the full engagement: 24 hours (8\u201310 hours on planning, the remainder on coding agent oversight and verification review)<\/li>\n\n\n\n<li>At $150\/hour blended rate: ~$3,600 in human time<\/li>\n\n\n\n<li><strong>Total: ~$3,900<\/strong><\/li>\n<\/ul>\n\n\n\n<p><strong>What the planning phase caught (conservative estimates on avoided downstream cost):<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Planning Finding<\/th><th>Cost if Found During Build<\/th><\/tr><\/thead><tbody><tr><td>Wrong database technology<\/td><td>$12K\u2013$18K<\/td><\/tr><tr><td>Wrong API paradigm<\/td><td>$20K\u2013$40K<\/td><\/tr><tr><td>Step Functions constraint violation<\/td><td>$8K\u2013$15K<\/td><\/tr><tr><td>COPPA compliance undefined<\/td><td>$20K\u2013$100K+<\/td><\/tr><tr><td>SLA contradictions<\/td><td>$5K\u2013$15K<\/td><\/tr><tr><td>DR validation absent<\/td><td>$20K\u2013$50K<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><strong>What the verification layer caught (conservative estimates on avoided production cost):<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-fixed-layout\"><thead><tr><th>Verification Finding<\/th><th>Cost if Shipped to Production<\/th><\/tr><\/thead><tbody><tr><td>5 data hooks returning mock data<\/td><td>$18K\u2013$36K emergency debugging + rework<\/td><\/tr><tr><td>Payment flow missing entirely<\/td><td>$30K\u2013$80K incident + compliance review<\/td><\/tr><tr><td>Auth guard gaps<\/td><td>$15K\u2013$30K security incident response<\/td><\/tr><tr><td>Core features absent (waitlist, registration mutations)<\/td><td>$20K\u2013$40K sprint + release delay<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Conservative avoided cost across planning and verification: <strong>$128K\u2013$394K.<\/strong><\/p>\n\n\n\n<p>Return on $3,900 total investment: <strong>33x to 100x.<\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">The 24 Hours<\/h2>\n\n\n\n<p>This is the part that usually prompts disbelief: 24 hours of human time for a 5-phase, 11-document planning suite, a full architecture review, and nearly 1,000 check items of implementation verification across multiple sprint phases.<\/p>\n\n\n\n<p>The human wasn&#8217;t writing the plans or running the checks. They were directing, reviewing findings, making decisions, and providing the judgment that the agents couldn&#8217;t. The agents were doing the systematic work \u2014 generating documents, running parallel review passes, reading codebases, producing verification reports, iterating on fixes.<\/p>\n\n\n\n<p>What a senior engineer&#8217;s time bought in this engagement:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Architectural judgment on the 11 critical planning findings<\/li>\n\n\n\n<li>Business context for the COPPA and compliance gaps<\/li>\n\n\n\n<li>Decision-making on the 3 deferred major findings (offline mode, data import, AI algorithm spec)<\/li>\n\n\n\n<li>Oversight of 5\u20136 verification iterations to confirm the gaps were actually closed<\/li>\n<\/ul>\n\n\n\n<p>That&#8217;s 24 hours of high-leverage human judgment, not 24 hours of mechanical checking.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h2 class=\"wp-block-heading\">The Question for Every Team Running Coding Agents<\/h2>\n\n\n\n<p>When your coding agent declares a phase complete, how do you know 30\u201340% of the spec isn&#8217;t missing?<\/p>\n\n\n\n<p>Most teams don&#8217;t have a systematic answer to this question. They have code review \u2014 which catches what was built badly, not what wasn&#8217;t built at all. They have QA \u2014 which catches failures in flows that were implemented, not absences of flows that should have been. They have experienced developers who intuitively notice gaps \u2014 but that scales with headcount, not with the number of agents you&#8217;re running.<\/p>\n\n\n\n<p>The verification gap is the gap between what the coding agent thinks it built and what the specification required. Closing it needs a system, not a person reading code line by line.<\/p>\n\n\n\n<p>That&#8217;s what the planning layer and verification layer together provide: the specification that makes verification possible, and the systematic process that makes it happen at every phase.<\/p>\n\n\n\n<p>The constraint on AI development productivity isn&#8217;t the coding agent. It&#8217;s the loop around it.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><em>Brunel Agent is an AI development planning platform. Plan \u2192 Export \u2192 Execute \u2192 Verify. If you&#8217;re ready to close the loop on your AI development workflow, <a href=\"https:\/\/www.loadsys.com\/brunel\/\">get started now \u2192<\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Your coding agent is lying to you about completion. Not maliciously. Not even technically incorrectly, in its own context window, the work does look done. But when a structured verification agent reads the actual files against a detailed specification, the story changes. On a recent application build, every time the coding agent reported a phase [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":837,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"_daextam_enable_autolinks":"1","_analytify_skip_tracking":false,"footnotes":""},"categories":[98,136,135,145,147],"tags":[],"ttd_topic":[161,148,163,164,151,156,150],"class_list":["post-835","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence","category-agent-accuracy","category-ai-coding","category-ai-development","category-coding-agents","ttd_topic-amazon-dynamodb","ttd_topic-artificial-intelligence","ttd_topic-graphql","ttd_topic-postgresql","ttd_topic-quality-assurance","ttd_topic-service-level-agreement","ttd_topic-stripe-inc"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.loadsys.com\/wp-json\/wp\/v2\/posts\/835","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.loadsys.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.loadsys.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.loadsys.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.loadsys.com\/wp-json\/wp\/v2\/comments?post=835"}],"version-history":[{"count":0,"href":"https:\/\/www.loadsys.com\/wp-json\/wp\/v2\/posts\/835\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.loadsys.com\/wp-json\/wp\/v2\/media\/837"}],"wp:attachment":[{"href":"https:\/\/www.loadsys.com\/wp-json\/wp\/v2\/media?parent=835"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.loadsys.com\/wp-json\/wp\/v2\/categories?post=835"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.loadsys.com\/wp-json\/wp\/v2\/tags?post=835"},{"taxonomy":"ttd_topic","embeddable":true,"href":"https:\/\/www.loadsys.com\/wp-json\/wp\/v2\/ttd_topic?post=835"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}