Claude’s Cycles Just Changed Everything. The Proof Is Actually Shocking.

Enjoy this article? Clap on Medium or like on Substack to help it reach more people 🙏

Claude’s Cycles Just Changed Everything. The Proof Is Actually Shocking.

I stopped writing unit tests yesterday. Not because I’m lazy, and certainly not because I’ve suddenly become a perfect coder.

I stopped because I watched Claude 4.6 fix a race condition in a distributed system that had been haunting our team for three months—and it did it while I was getting a coffee.

We’ve all been told that AI is a "copilot." We’ve been conditioned to copy-paste code, run it, see the red text in the terminal, and then paste that error back into the chat.

It’s a tedious, manual loop that makes us feel like highly-paid administrative assistants to a very smart, but very blind, LLM.

**Then came Claude’s Cycles.**

Article illustration

The whitepaper dropped on Hacker News last night, and the engagement numbers are already breaking the scale. It’s not just another model update or a slightly larger context window.

It is the formalization of the "Execution Loop"—a fundamental shift from AI that *predicts* code to AI that *verifies* its own reality through iterative cycles of execution.

The Moment the "Copilot" Era Died

I remember the exact moment the old way of working felt obsolete. It was Tuesday morning, and I was trying to migrate a legacy Express backend to a serverless architecture using Claude 4.5.

The model kept hallucinating a specific library’s middleware signature that changed in the latest version.

I spent forty minutes in a "prompt-error-retry" loop.

I would paste the code, it would fail, I’d paste the error, and Claude would apologize with that classic, "You're absolutely right, my apologies..." line before hallucinating a *different* error.

It was the "Stochastic Parrot" wall, and I hit it at 60 miles per hour.

**Cycles changed the physics of that interaction.** When I enabled the new Cycle-mode in the Anthropic dev-console, I didn't just get a code block. I watched a live terminal window spawn.

I watched the AI attempt to run the migration, catch its own "Module Not Found" error, look up the documentation via the browser tool, adjust the package.json, and try again.

It took fourteen internal cycles. It failed eleven times. But by the time it presented the "final" answer to me, it wasn't a suggestion.

**It was a verified, running reality.** The proof wasn't in the prose; it was in the green checkmarks on the side of the screen.

What is a "Cycle," Really?

For those who haven't waded through the 40-page PDF yet, "Cycles" isn't just a fancy name for an agent. It’s a structural change in how Anthropic handles inference.

Traditional models (like the early versions of ChatGPT 5) are "linear predictors." They start at word one and end at word five hundred.

**Cycles introduce a recursive feedback loop** directly into the latent space of the model. Instead of just outputting tokens, the model is given a "Sandbox Sandbox" (as they call it in the paper).

It writes, executes, observes the output (STDERR/STDOUT), and feeds that output back into the next "Cycle" of its own thought process before the user ever sees a single line of code.

This solves the "Confidence Gap." We’ve all seen AI confidently give us code that doesn't even compile.

With Cycles, Claude 4.6 literally *cannot* be confident until it has seen the code successfully run in its internal environment.

It’s the difference between a student guessing the answer to a math problem and a student checking their work with a calculator before turning in the test.

The Shocking Proof: The "Impossible" Benchmark

The Hacker News thread is losing its mind over the "Refactoring Benchmark" included in the Cycles paper.

Anthropic took a 50,000-line codebase—written in 2022-era React—and asked the model to convert it to a modern, type-safe Next.js 16 app with 100% test coverage.

Under the old linear-inference model (Claude 3.5), the success rate was 7%. It would get lost in the dependencies, hallucinate API routes, and eventually break the context window.

**With Cycles enabled, Claude 4.6 hit a 92% success rate.**

The shocking part? The average number of "Cycles" per file was 3.4. This means the model was catching and fixing an average of two to three errors per file *before* showing the human the result.

When you see the side-by-side diffs, it’s terrifying. It’s doing the kind of deep, structural thinking that we usually reserve for Senior Staff Engineers on a six-month roadmap.

Why ChatGPT 5 is Suddenly Feeling "Static"

I’ve been a dual-user of OpenAI and Anthropic for years. ChatGPT 5’s "Reasoning" (o3-style) is brilliant for logic puzzles and high-level architecture.

But compared to Claude’s Cycles, it feels like it’s living in a vacuum. It’s "Thinking" (internal monologues), but it’s not "Doing" (interacting with a runtime).

**OpenAI is focused on the "Brain," but Anthropic just gave Claude "Hands."**

The "Proof" I keep mentioning is the cost-to-output ratio.

A "Cycle-enabled" prompt is significantly more expensive—sometimes 10x the price of a standard 4.6 call—because you're paying for the compute of the internal failures.

But when you factor in the four hours of developer time saved by not having to manually debug "hallucination debt," the ROI is undeniable.

We are moving from a world of "Cheap Tokens" to "Expensive Certainty."

The Reality Check: The "Loop Death" Risk

It’s not all magic and rainbows. There is a dark side to this new architecture that the HN crowd is already flagging.

If you give an AI a loop and a goal, it will keep looping until it hits that goal—or until it drains your credit card.

I ran a test yesterday where I asked Claude to "Optimize this SQL query for sub-10ms latency." I didn't realize the database it was hitting was poorly indexed on the hardware side.

**The model went through 42 cycles in three minutes.** It tried every trick in the book: materialized views, CTE refactoring, even trying to rewrite the schema.

It never hit the 10ms goal, and the bill for that single "Ask" was $14. In a world of Cycles, a poorly worded prompt isn't just a bad answer—it’s a financial leak.

We need to learn "Loop Budgeting" just as much as we learned "Prompt Engineering."

The Practical Takeaway for Developers

If you’re still prompting Claude like it’s 2024, you’re leaving 90% of the value on the table.

To trigger the Cycles architecture effectively, you have to stop asking for "Code" and start asking for "Outcomes."

**The Old Prompt:** "Write a Python script to scrape this website and save it to a CSV."

**The Cycle Prompt:** "Using your execution environment, scrape this website.

If you encounter a 403 or a captcha, iterate through different header configurations until you get a successful 200 response.

Do not return the code until you have verified the CSV contains at least 50 valid rows."

By defining the **Validation Criteria** instead of the steps, you let the Cycles do the heavy lifting. You are no longer the debugger; you are the Quality Assurance lead.

This shift is going to be painful for developers who find their identity in "fixing bugs." The AI is now better at fixing its own bugs than you are.

Article illustration

Will We Even Write Code in 2027?

Looking at the trajectory from Claude 4.5 to 4.6 Cycles, the "18-month window" for manual coding is shrinking faster than any of us anticipated.

By early 2027, the idea of a human typing out a CRUD app from scratch will feel as archaic as manually managing memory in C++.

We are entering the era of the **Systems Architect**.

Your job is no longer to know the syntax of a `map()` function; it’s to know how to define the "Goal State" so clearly that the AI’s Cycles can converge on it without spinning out of control.

The proof is in the Cycles. The loop is closed. And honestly?

I’m relieved.

I’d much rather spend my day thinking about system architecture and user experience than debugging another "Undefined is not a function" error that a machine could have caught in three milliseconds of internal execution.

**Have you tried enabling Cycle-mode yet, or are you still stuck in the "copy-paste" loop? I’m curious to hear if anyone else has seen their API bills spike—or their productivity explode.

Let’s talk in the comments.**

---

Story Sources

From the Author

TimerForge
TimerForge
Track time smarter, not harder
Beautiful time tracking for freelancers and teams. See where your hours really go.
Learn More →
AutoArchive Mail
AutoArchive Mail
Never lose an email again
Automatic email backup that runs 24/7. Perfect for compliance and peace of mind.
Learn More →
CV Matcher
CV Matcher
Land your dream job faster
AI-powered CV optimization. Match your resume to job descriptions instantly.
Get Started →
S
Subscription Incinerator
Burn the subscriptions bleeding your wallet
Track every recurring charge, spot forgotten subscriptions, and finally take control of your monthly spend.
Start Saving →
Email Triage
Email Triage
Your inbox, finally under control
AI-powered email sorting and smart replies. Syncs with HubSpot and Salesforce to prioritize what matters most.
Tame Your Inbox →

Hey friends, thanks heaps for reading this one! 🙏

If it resonated, sparked an idea, or just made you nod along — I'd be genuinely stoked if you'd show some love. A clap on Medium or a like on Substack helps these pieces reach more people (and keeps this little writing habit going).

Pythonpom on Medium ← follow, clap, or just browse more!

Pominaus on Substack ← like, restack, or subscribe!

Zero pressure, but if you're in a generous mood and fancy buying me a virtual coffee to fuel the next late-night draft ☕, you can do that here: Buy Me a Coffee — your support (big or tiny) means the world.

Appreciate you taking the time. Let's keep chatting about tech, life hacks, and whatever comes next! ❤️