**Stop trusting the "unlimited" label on your AI subscriptions.
I spent all of yesterday morning watching a $140,000-a-year Lead Engineer stare at a "Session Limit Reached" screen after just four prompts—and it’s the clearest sign yet that the Golden Age of cheap, long-context compute is officially over.**
Last Tuesday, I was sitting in a glass-walled conference room in Palo Alto with Sarah, a senior architect at a Series B fintech startup.
She was mid-refactor, using **Claude 4.6** to migrate a legacy codebase to a new serverless architecture.
She had uploaded six core files—maybe 40,000 tokens total—and was asking the model to map out the dependency graph.
"I have two messages left," she said, her voice flat with disbelief. "I just started this session twenty minutes ago. Last month, I could spend four hours in a single thread before Claude even blinked."
Sarah isn't alone. Over the last 72 hours, the **r/ClaudeAI** community has exploded with similar reports, reaching an engagement level that signals a genuine crisis for professional users.
Anthropic has quietly adjusted the way session limits are calculated for Claude 4.6, and for those of us building real products, the implications are disastrous.
As a founder, I’ve seen this movie before.
A company releases a "god-tier" model, hooks the power users with massive context windows and generous limits, and then—once the compute bills start hitting the billions—they start turning the knobs.
But this isn't just a minor throttle.
According to several developers I’ve interviewed this week, the effective utility of **Claude 4.6’s 200k context window** has been cut by as much as 60% for high-intensity tasks.
Anthropic hasn't issued a formal press release about these changes, but the evidence is in the "Usage Exhausted" screens appearing hours earlier than they did in February.
"It feels like they’re taxing our intelligence," Sarah told me. "We built our entire sprint velocity around the assumption that Claude could hold our whole project in its head.
Now, it’s like trying to code with a genius who has five-minute amnesia."
To understand the scale of the change, I reached out to Mark, a CTO of a 50-person engineering firm in Austin.
He’s been tracking his team's seat-usage metrics since they migrated from **ChatGPT 5** to Claude 4.6 earlier this year.
"The math changed overnight around March 15," Mark explained. "We’re seeing a new logic in how 'Project' context is handled. Previously, Anthropic seemed to cache your project files efficiently.
Now, every time you send a message, it feels like the system is re-calculating the entire 100k+ token stack against your limit."
Mark showed me a dashboard of his team's productivity. There was a visible 22% dip in "completed PRs" starting last week.
When he polled his team, the answer was unanimous: they were spending 30 to 45 minutes every afternoon waiting for their "limit reset" or switching back to inferior models just to finish a task.
**"The 'Quiet' Update is the worst kind of update,"** Mark said. "If you’re going to raise prices or lower limits, tell us. Don’t just let our engineers hit a brick wall in the middle of a deployment."
Why is this happening now, on March 28, 2026? To find out, I spoke with a former product researcher from a rival LLM lab who asked to remain anonymous.
They pointed to the "VRAM Crisis" that has been quietly brewing since the start of the year.
"The hardware hasn't kept up with the models," the researcher told me. "Claude 4.6 is significantly more 'expensive' to run per token than 3.5 or even 4.5 was.
When you have millions of users uploading 150,000-token projects and asking 'where is the bug?', you’re burning through specialized NVIDIA Blackwell or Rubin clusters at a rate that is simply not sustainable at $20 a month."
The researcher suggests that Anthropic is likely implementing **Dynamic Token Weighting**.
This means if your session is heavy on "Project" context (those files you keep pinned to the sidebar), each new message you send "costs" significantly more of your daily allowance than a fresh, empty chat.
**This is the hidden tax of 2026.** We wanted models that could read our entire codebase, and Anthropic gave them to us.
But they forgot to tell us that "reading" that codebase every time we hit 'Enter' costs more than our subscription is worth.
While Claude users are reeling, the landscape is shifting.
**ChatGPT 5** recently introduced "Infinite Memory," which promised to solve this, but early benchmarks show it’s mostly just a clever RAG (Retrieval-Augmented Generation) trick that often misses the "deep" logic Claude is famous for.
Meanwhile, **Gemini 2.5** still offers a massive 2-million-token window, but as any developer will tell you, "more context" doesn't mean "more logic." I tested the same refactor task Sarah was struggling with on Gemini 2.5 yesterday.
It didn't hit a limit, but it also failed to realize that the API endpoint it suggested had been deprecated since late 2025.
"I’d rather have 5 messages from a genius than 100 messages from a hallucinating intern," Sarah admitted.
"But right now, Anthropic is giving us the genius for only two messages, and that’s not enough to finish a single function."
There is a growing sentiment among the r/ClaudeAI community that the $20/month "Pro" tier has been downgraded to a "Trial" tier. For founders like me, this creates a massive headache for budgeting.
If my engineers can't rely on a $20 tool, do I move them to the **API-only workflow**? I ran the numbers this morning.
For a developer like Sarah to do her daily work via the Claude 4.6 API—maintaining that same 100k token context—the cost would balloon from $20 a month to roughly **$450 per developer, per month.**
That is a 2,250% price increase hidden behind a "Session Limit" screen. For a startup with 10 devs, that's the difference between a rounding error and a $50,000 annual line item.
If you’re a developer or a tech leader, you can’t afford to wait for Anthropic to "fix" this. They likely won't. The compute costs are the new reality of 2026. Here is how I’m advising my team to adapt:
1. **Shred Your Context:** Stop uploading the "whole project." Use a tool to map your file structure into a single Markdown summary and only upload the 2-3 files you are actively editing.
2. **The "One-Shot" Rule:** Treat every Claude message like it costs $5. Stop the "incremental" chatting.
Write 500-word prompts that include the problem, the constraints, the code, and the expected output in one go.
3. **Local Model Offloading:** Use **Llama 4 (8B or 70B)** locally for the "dumb" tasks—formatting, documentation, and simple unit tests. Save your Claude 4.6 tokens for the high-level architecture.
Returning to that conference room in Palo Alto, I watched Sarah close her laptop. She looked defeated.
It wasn't just that she couldn't finish her code; it was that her "flow state" had been shattered by a subscription limit.
"In 2024, we were told AI would make us 10x faster," she said, looking out at the Stanford hills. "In early 2026, it actually did. But today?
I’m spending half my brainpower just managing my 'token budget' so I don't get locked out before lunch. That’s not progress."
We are entering a phase of the AI revolution where **access to intelligence is being tiered** more aggressively than we ever imagined. The "Quiet Update" to Claude's limits is just the first tremor.
If you aren't preparing for a world where "Model Usage" is a precious, finite resource, you’re going to be left staring at a blank screen while your competitors—the ones who can afford the $500/month API bills—keep building.
**Have you noticed your Claude 4.6 sessions ending significantly earlier this month, or are you still seeing the "unlimited" performance we had in February?
Let’s talk about your "usage walls" in the comments.**
Hey friends, thanks heaps for reading this one! 🙏
If it resonated, sparked an idea, or just made you nod along — I'd be genuinely stoked if you'd show some love. A clap on Medium or a like on Substack helps these pieces reach more people (and keeps this little writing habit going).
→ Pythonpom on Medium ← follow, clap, or just browse more!
→ Pominaus on Substack ← like, restack, or subscribe!
Zero pressure, but if you're in a generous mood and fancy buying me a virtual coffee to fuel the next late-night draft ☕, you can do that here: Buy Me a Coffee — your support (big or tiny) means the world.
Appreciate you taking the time. Let's keep chatting about tech, life hacks, and whatever comes next! ❤️