I Audited 6 Months of AI PRs. It’s Actually Worse Than You Think.

By Sarah Chen · April 12, 2026 · 11 min read

ai-codingsoftware-engineeringcode-qualityproductivitypull-requeststechnical-debt

Stop bragging about your AI-driven velocity. I’m serious.

I just spent the last 72 hours auditing 180 days of Pull Requests from a team that "accelerated" using Claude 4.6 and ChatGPT 5, and the results are terrifying: **we aren't building software anymore, we’re building a house of cards that is one edge-case away from a total infrastructure collapse.**

I’ve been a full-stack developer and educator for over a decade, and I’ve seen every "productivity silver bullet" from Low-Code to No-Code come and go. But this is different.

We have finally decoupled the act of *shipping code* from the act of *understanding logic*, and as of April 2026, the technical debt we are accruing is reaching a terminal velocity that no amount of automated refactoring will ever fix.

The 10x Velocity Lie

Everyone loves the new metrics.

Your Engineering Manager is showing off charts where "Lines of Code Produced" and "Tickets Closed" have spiked by 400% since the team got enterprise seats for the latest LLMs.

On paper, you look like a god-tier engineering organization.

But lines of code have always been a vanity metric, and in the age of AI, they’ve become actively dangerous.

When I looked under the hood of these "high-velocity" PRs, I didn’t see elegant solutions; I saw **Shallow Shipping.**

The industry has fallen for the "Happy Path" trap.

Because ChatGPT 5 can generate a functional React component in three seconds, developers are stopping the moment the UI renders in their local dev environment.

They aren't stress-testing the state machine, they aren't checking for memory leaks in `useEffect` hooks, and they certainly aren't thinking about how that code behaves when the API returns a 503 or a malformed JSON object.

Evidence Exhibit A: The "Ghost" Logic

The first thing that jumped out during my audit was what I call "Ghost Logic." This is code that looks perfectly valid to a human eye but contains subtle, hallucinated logic that only triggers under specific conditions.

In one PR from February, an AI-generated utility function for handling currency conversion worked perfectly for USD and EUR.

However, the LLM had "assumed" a decimal precision for all currencies that didn't exist for the Japanese Yen (JPY), leading to a rounding error that sat in production for two months.

**It cost the company $14,000 in lost transaction fees before a manual audit caught it.**

The developer didn't catch it because they didn't write the logic; they prompted the result. When you write code yourself, you're forced to think about the bounds of the problem.

When you prompt, you’re just a consumer of a black-box output. We are losing our "Developer Intuition"—that tingle in the back of your neck that says, "Wait, what happens if this array is empty?"

Evidence Exhibit B: Dependency Bloat is Back with a Vengeance

Remember when we used to joke about `left-pad`? AI has made that look like child's play.

LLMs, especially Claude 4.6, have a strange obsession with suggesting entire NPM packages to solve three-line logic problems.

During my 6-month audit, I found that our `package.json` had grown by 42 dependencies.

When I dug into why, I found that for almost every complex regex or date-manipulation task, the AI had suggested a specialized library rather than writing a native TypeScript function.

The developers, eager to close the ticket, just ran `npm install`.

We now have three different date-formatting libraries in the same microservice because three different developers used three different prompts.

**This isn't "acceleration"; it's a supply-chain attack waiting to happen.** Every new dependency is a new security surface area, and we’re adding them like we’re collecting trading cards.

The Death of the Code Review

This is the part that makes me actually angry. The "Senior" developers on this team—the people who are supposed to be the gatekeepers of quality—have completely checked out.

How can you blame them? When a junior dev submits a PR with 1,200 lines of AI-generated code for a "simple" feature, no human being has the cognitive bandwidth to truly review it.

The reviewer sees that the tests passed (tests that were also written by AI, by the way) and they click 'Approve'.

**We have replaced Code Review with "Vibe Check."** If the code looks like it follows the style guide and the linting passes, it goes to production. But linting isn't logic.

You can write perfectly formatted, type-safe garbage that still nukes your database connections at 2 AM on a Sunday.

The Real Problem: We Are Deleting the "Why"

The actual underlying issue isn't the AI itself; it's the total erosion of the "Why." Engineering is the art of trade-offs.

You choose a specific data structure because of its performance characteristics. You choose a specific architectural pattern because of its scalability.

When you use an LLM to generate a solution, the trade-off is hidden. You don't know why the AI chose a `Map` over an `Object`.

You don't know why it decided to use a recursive function instead of a loop.

If you don't know the "Why," you cannot maintain the "What." Six months into this AI experiment, the team is now terrified to touch the "core" services because nobody actually understands how the generated logic works.

**We have built a legacy codebase in record time, and we did it to ourselves.**

What You Should Do Instead (The 2026 Survival Guide)

I’m not telling you to go back to Vim and a physical copy of the MDN docs. That ship has sailed. But if you want to still have a job in 2027, you need to change how you work *immediately*.

1. The "Explain-to-Me" Rule

Never, under any circumstances, commit code that you cannot explain line-by-line to a junior dev.

If the AI gives you a complex `reduce` function or a clever bit-wise operation, and you can’t explain exactly how it handles an edge case, you delete it. You rewrite it until it’s "Human-Scale" logic.

2. Strict Audit Trails for Prompts

Stop copy-pasting code. Start documenting the intent. We’ve started requiring developers to include their primary prompt in the PR description for any logic-heavy changes.

This allows reviewers to see the *constraints* given to the AI, which is often where the bugs start.

3. Kill the "Mega-PR"

If an AI generates 500 lines of code for you, you don't get to submit a 500-line PR. You break it down into 50-line chunks of verifiable logic.

If you can't break it down, it means the code is too tightly coupled and the AI has "hallucinated" a monolith where a module should be.

4. Test the AI, Don't Let the AI Test

Use AI to write your boilerplate, but manually write your "Fail" cases. AI is great at writing tests that pass. It’s terrible at writing tests that *should* fail.

If you don't manually write the tests for your edge cases, you don't have a test suite; you have a "Confirmation Bias Suite."

The Uncomfortable Truth

How many hours have you spent "tuning" a prompt this week because you didn't want to look up the documentation for a new API?

When was the last time you actually felt the satisfaction of solving a hard logic problem without reaching for a chat window?

We are trade-offing our expertise for convenience. We’re becoming "Prompt Managers" instead of Software Engineers.

And the problem with being a Prompt Manager is that once the AI gets 10% better, the company won't need the manager—they'll just talk to the AI themselves.

**The only way to remain valuable in a world of infinite code is to be the person who understands what the code is actually doing.** Otherwise, you’re just a glorified copy-paster, and your expiration date is approaching faster than you think.

Have you noticed the quality of your team's PRs slipping since you went all-in on AI, or am I just being a "Senior Dev" curmudgeon? Let’s fight about it in the comments.

***