The LLM Plateau Myth
Everywhere you look, the same refrain: LLMs have plateaued.
GPT-5 was… not the world-changer it was hyped to be.
Claude Opus 4.1 felt like tweaking around the edges.
Gemini 3? We'll see, but expectations are muted.
But let's pump the brakes on this plateau narrative. For those declaring models have stalled, have you actually looked at 2025's release calendar?
January: o3-mini
February: GPT-4.5 ("Orion"), Claude 3.7 Sonnet, Gemini 2.0 Flash-Lite
March: Grok-3
April: GPT-4.1, o3, o4-mini
May: Gemini 2.5 Pro Preview, Claude 4
June: o3-pro, Gemini 2.5 Pro
July: Grok 4
August: GPT-5, Opus 4.1
This list — with new models dropping literally every month — isn't even complete.
Consider what actually happened:
xAI was barely a footnote until Grok 3 in March made people think "OK, they're serious." Four months later, Grok 4 vaulted them into the frontier tier. Basically overnight.
Google wasn't even worth using in 2024. By June's Gemini 2.5 Pro, they didn't just enter the race — they took the lead.
The entire Claude 4 family that everyone's measuring against? Released in May. Not a year ago. May.
Compare today's models to what amazed us in 2024 and it's laughable. GPT-4o was chatty and charming, but current frontier models smoke it on actual work. Claude was on Sonnet 3.5 — solid but clearly behind today's Sonnet 4. Grok and Gemini weren't even in the conversation.
And I haven't mentioned DeepSeek's reasoning breakthrough, Kimi-K2, or Meta's massive Llama 4 leap.
So why does the plateau narrative persist?
Partly, it's garden-variety impatience. When everything moves this fast, you get addicted to the pace. Why can't it be even faster?
Partly, it's moving goalposts. Everyone wants AGI, but the definition keeps shifting.
As I write this, I just told Claude Code: "dude, you're good to tackle it all. I appreciate you!" It's now systematically building error handling, performance testing, mobile responsiveness, and UX updates for a new product feature.
Tell someone five years ago you'd have this capability. Is it AGI? Maybe not. But the intelligence feels pretty general.
But the biggest reason for plateau talk?
The models already crush most things people actually need.
Most common LLM tasks don't require frontier power:
General knowledge questions
Concept explanations
Translation
Summarization
Writing help
Basic coding
You want proof? Look at the outcry when OpenAI removed GPT-4o from ChatGPT. People were furious not because they needed cutting-edge reasoning, but because GPT-4o already handled their daily needs perfectly.
The average person isn't asking about RNA synthesis. They want macaroni and cheese recipes for their Instant Pot. Productivity tips for that precious hour between kids' bedtime and their own. Draft emails about school allergen policies.
When current models already nail these tasks, new releases feel like shrugs. "Doesn't seem much better."
So why push for more powerful models at all?
Because there are crucial areas where LLMs still fall short — but are getting better. Practitioners in those domains absolutely notice the improvements in new models.
The real challenge isn't raw intelligence. It's application.
OpenAI learned this the hard way with GPT-5's lukewarm reception. It happens. They course-corrected quickly.
Going forward, the focus should shift from cramming more powerful models down mass-market throats — most people don't care — to the hard work of applying these "already good enough" models to more of what people actually need done.
Frontier companies might do some of this, but startups building the application layer will likely win. It's already happening, but the opportunity is massive.
LLMs aren't helping more people across more daily tasks because they lack context and connections to understand specific circumstances and actually make things happen. Not because they're not smart enough.
The next wave of models will indeed be smarter. But they won't magically solve the application gap. Ingenuity and creativity in applying current models will.

