I spent $180,000 and three years at law school. Last week, I watched GPT-5 score higher than federal judges on legal reasoning tests, and I felt something I hadn't expected — relief.
Not because AI is "coming for lawyers" or any of that tired narrative.
But because after running my own experiments with GPT-5's legal capabilities for the past two weeks, I finally understood what my actual job is. And it's not what law school taught me.
The Stanford study hit my feed on a Tuesday morning.
Researchers had given GPT-5 and 18 federal judges the same set of complex legal reasoning problems — statutory interpretation, constitutional analysis, precedent application. The AI didn't just pass.
It scored 87% compared to the judges' average of 73%.
My first thought was defensive. "They probably cherry-picked easy cases." So I did what any skeptical lawyer-turned-developer would do.
I built my own test.
I fed GPT-5 twenty real cases from my litigation days — messy ones with contradictory precedents, ambiguous statutes, and the kind of factual complexity that makes junior associates cry.
Cases where reasonable judges had disagreed. Where circuit splits existed. Where the "right" answer depended on which interpretive philosophy you subscribed to.
GPT-5 nailed seventeen of them.
Not just the holdings, but the reasoning paths. It cited relevant cases I'd forgotten. It identified statutory tensions I'd missed during my original research.
In one employment discrimination case, it spotted a procedural issue that had taken me three weeks to find back in 2021.
Here's what nobody's talking about in the "AI beats judges" headlines. GPT-5 isn't just memorizing case law. It's doing something more interesting — and more unsettling.
When I asked GPT-5 to analyze a complex securities fraud case, it didn't just cite _Tellabs_ and _Dura Pharmaceuticals_.
It identified a pattern across 47 different circuit court decisions that showed how courts subtly shifted their scienter analysis based on the defendant's industry.
I've been practicing for six years. I'd never noticed that pattern.
The prompt I used was embarrassingly simple:
``` Analyze this securities fraud complaint for Rule 9(b) particularity and PSLRA scienter requirements. Consider circuit-specific variations
in how courts apply these standards. ```
It returned a 2,000-word analysis that my old firm would have billed $3,000 for.
Federal judges are human. They have bad days, implicit biases, and occasionally, they phone it in on routine motions. GPT-5 doesn't.
I ran the same complex tax case through GPT-5 twenty times with slightly different phrasings. The core legal analysis remained consistent every time. The conclusions were identical.
Only the explanation style varied.
Try getting that from twenty different judges. Hell, try getting that from the same judge on different days.
Here's the number that broke my brain: GPT-5 completed what would be 40 billable hours of research in 8 minutes.
Not rushed, surface-level summaries. Deep analysis with pin citations, alternative arguments, and strategic considerations. The kind of work senior associates pride themselves on.
But here's where the story gets interesting. Because GPT-5 is also catastrophically wrong in ways that would get you disbarred.
Last Thursday, I asked GPT-5 about a niche area of maritime law. It confidently cited _Morrison v.
Neptune Shipping Corp_, complete with a federal reporter citation and a compelling quote about admiralty jurisdiction.
That case doesn't exist.
When I called it out, GPT-5 apologized and provided three more cases. Two were real. One was completely fabricated, down to the docket number.
This isn't a training problem you can fix with RLHF. It's fundamental to how these models work. They're probability machines, not truth machines.
And in law, the difference between "probably correct" and "definitely correct" is the difference between keeping and losing your license.
Real litigation involves thousands of documents. Discovery in my last big case produced 2.8 million pages.
Even with GPT-5's expanded context window, you're looking at chunking, summarizing, and hoping you don't lose critical details in the compression.
I tried feeding it a full merger agreement with all exhibits — 400 pages of dense legalese. It choked. Not technically (it processed it), but qualitatively.
It missed subtle inconsistencies between the main agreement and Exhibit J that a first-year associate would catch.
Law isn't just logic. It's politics, personality, and power wrapped in Latin phrases.
GPT-5 can tell you what the law says. It can't tell you that Judge Martinez hates discovery disputes and will sanction you for bringing weak motions.
It doesn't know that opposing counsel just went through a divorce and might be more amenable to settlement. It can't read the room when a judge's questions signal they've already decided against you.
These aren't edge cases. They're the entire game at the trial court level.
Here's what the Stanford study actually reveals, and what nobody wants to admit: Most legal work isn't legal reasoning.
It's client management. It's strategy based on incomplete information. It's navigating courthouse politics. It's making judgment calls about risk that no amount of pattern matching can replicate.
GPT-5 beating judges at legal reasoning is like saying a calculator beats mathematicians at arithmetic. True, but it misses the point of what mathematicians actually do.
I haven't quit law. But I've completely changed how I practice.
My new workflow looks like this:
1. **Initial Analysis**: GPT-5 gets first crack at every research question. I use this prompt template:
``` Analyze [specific legal question] under [jurisdiction] law. Include: (1) governing statutes, (2) key cases with pin cites,
(3) circuit splits or disagreements, (4) strategic considerations. Flag any areas of uncertainty. ```
2. **Verification Layer**: Everything gets verified. Every case, every quote, every citation. I built a Python script that automatically checks citations against Westlaw's API.
3. **Strategic Overlay**: This is where humans still matter. What's the judge's temperament?
What's opposing counsel's weakness? What story will resonate with this particular jury pool?
4. **Client Translation**: GPT-5 writes the first draft of client explainers. I edit for empathy and context that AI can't grasp.
The result? I'm doing better work in less time. But I'm also having an existential crisis about what "being a lawyer" means in 2026.
By 2028, I think most transactional law will be AI-first. Contract drafting, due diligence, regulatory compliance — GPT-6 or GPT-7 will handle 90% of it with minimal human oversight.
Litigation will hold out longer. Not because the legal reasoning is harder, but because litigation is theater. And audiences still prefer human actors.
The lawyers who survive won't be the ones who know the most law. They'll be the ones who understand that law was never really about the law.
It was about translating human problems into systemic solutions, and systemic solutions back into human outcomes.
GPT-5 can do the translation. It can't do the understanding.
I've been testing GPT-5's legal capabilities for two weeks now. It's better at legal reasoning than I am. It's faster, more consistent, and has perfect recall of every case ever published.
So why do clients still hire me?
I think it's because when their world is falling apart — when they're facing bankruptcy, divorce, or criminal charges — they don't need perfect legal analysis.
They need someone who understands what it feels like to lose everything. Who can say "I've seen this before, and you'll get through it" and mean it.
GPT-5 can tell you what the law says about your situation.
Only a human can hold your hand while your life implodes and promise it gets better.
Maybe that's worth $180,000 in student loans after all.
---
**So here's my question for you**: If you're in a knowledge profession — law, medicine, engineering, whatever — have you run your own expertise against GPT-5 yet?
What did you discover about what you actually do versus what you thought you did?
Because I'm starting to think the real disruption isn't AI replacing us. It's AI forcing us to admit most of our "expertise" was just information retrieval with extra steps.
And maybe that's the most liberating thing that could happen to professional work.
---
Hey friends, thanks heaps for reading this one! 🙏
If it resonated, sparked an idea, or just made you nod along — I'd be genuinely stoked if you'd show some love. A clap on Medium or a like on Substack helps these pieces reach more people (and keeps this little writing habit going).
→ Pythonpom on Medium ← follow, clap, or just browse more!
→ Pominaus on Substack ← like, restack, or subscribe!
Zero pressure, but if you're in a generous mood and fancy buying me a virtual coffee to fuel the next late-night draft ☕, you can do that here: Buy Me a Coffee — your support (big or tiny) means the world.
Appreciate you taking the time. Let's keep chatting about tech, life hacks, and whatever comes next! ❤️