"It just makes stuff up." If you've ever said this about ChatGPT — or heard a colleague say it — you're not wrong. Hallucinations have been the single biggest barrier to trusting AI with real business tasks. Research it cites doesn't exist. Statistics it quotes are invented. It delivers nonsense with complete confidence.
Two updates that rolled out in the first week of March 2026 change that picture in meaningful ways. Not completely — but enough that it's worth revisiting where you draw the line on what you let AI do for you.
What Just Landed
OpenAI released two significant model updates within days of each other:
- GPT-5.3 Instant (released 3 March 2026) — the new default ChatGPT model for everyday use. The headline change: a 26.8% reduction in hallucinations when web search is enabled, plus fewer unnecessary refusals and less moralising preamble before answers.
- GPT-5.4 (released 5 March 2026) — available as GPT-5.4 Thinking in ChatGPT and through the API. The standout addition is native computer-use: the model can now control interfaces, websites, and applications directly, executing multi-step tasks on your behalf.
These aren't incremental tweaks. They address two different problems simultaneously — accuracy and autonomy. Understanding the difference matters for figuring out how to put them to work.
The Hallucination Problem, in Plain English
When an AI model "hallucinates," it generates information that sounds plausible but is simply wrong. It might cite a journal article that doesn't exist, quote a statistic it invented, or describe a product feature that was never built. The dangerous part isn't that it's wrong — it's that it sounds equally confident whether it's accurate or not.
This is why business owners hesitate to use AI for anything that actually matters. A marketing email with a made-up industry statistic goes out to your list. A supplier summary that fabricates a policy detail gets passed to your team. The stakes are real, and a tool that looks authoritative while being unreliable is arguably worse than no tool at all.
The hallucination rate has been improving steadily, but GPT-5.3's 26.8% improvement (with web search active) is notable because it targets a root cause: when the model can verify information against the live web rather than relying purely on its training data, it makes fewer things up. That's an architectural improvement, not just fine-tuning.
Does 26% Fewer Hallucinations Actually Matter?
Yes — with caveats.
The improvement applies specifically when web search is enabled in ChatGPT. When the model can pull live information to check its answers, hallucinations drop significantly. When it's working from its training data alone, the gains are less dramatic.
What this means practically:
- Tasks involving current facts, recent events, or real-world data — pricing, news, competitor info — are meaningfully more reliable
- Tasks that don't need web access — drafting copy, summarising documents you've pasted in, answering conceptual questions — haven't changed as much
- The model still isn't infallible. It just makes fewer confident mistakes on fact-dependent tasks
A 26% reduction is the kind of number that moves AI from "interesting experiment" to "actually usable for first drafts" for a lot of business owners. It's not a green light to stop checking its work — but it does shift the calculus on how much review you need to do.
Computer Control: What It Is and What to Do With It
GPT-5.4's native computer-use capability is a bigger conceptual leap. Instead of just answering questions or generating text, the model can now operate software — clicking through interfaces, filling out forms, navigating websites, and completing multi-step workflows the way a human assistant would.
For small businesses, the practical implications are early-stage but worth knowing about:
- Booking appointments or entering data across systems that don't have integrations
- Navigating supplier portals, government websites, or legacy tools
- Running repetitive workflows that involve copying information between applications
The honest take: this is still maturing technology. It works reliably on structured, predictable tasks and breaks down on anything requiring nuanced judgment. Don't hand it your banking portal. But for low-stakes, high-repetition tasks in controlled environments, it's worth experimenting with — especially if you've been putting off automating something because "there's no easy way to connect these two systems."
What You Can Now Trust AI With (and What Still Needs Human Eyes)
Here's a practical framework for where the reliability improvements actually shift the threshold:
Higher confidence — AI as first draft or first pass:
- Research summaries — with web search on, and you verify the key claims
- Competitor or market overviews — solid starting point, worth a sanity check
- Email drafts, social copy, client updates — lower factual risk, easy to review
- Meeting prep notes or internal briefing documents
Still requires careful review:
- Anything legal, financial, or compliance-related — non-negotiable
- Specific statistics or data points being cited publicly — verify the source
- Anything going directly to clients without a human reading it first
- Decisions with real consequences — AI as input, not decision-maker
The pattern is consistent: AI earns more trust as a first-draft tool, and less as a final authority. The GPT-5.3 update doesn't overturn that principle — it just shifts the threshold slightly in AI's favour.
Why This Is Actually a Trust Milestone
The "AI makes stuff up" objection has been legitimate. It's held a lot of business owners back from experimenting with anything beyond casual use — and rightly so. The GPT-5.3 improvement directly addresses the most cited reliability concern, and that matters more than any benchmark score.
Trust is built incrementally. The businesses that will see the most value from AI aren't waiting for it to be perfect. They're identifying the right tasks — low stakes, high repetition, easy to verify — and building reliable habits around them now. If you're not sure where to start, the AI quick wins post is a good place.
The gap between "this is a party trick" and "this is a genuine business tool" has been closing all year. These updates close it a little further — and knowing where the new line sits is how you use it well.