There's a conversation that comes up a lot when we talk to small business owners about AI: "I'd use it more, but I'm not comfortable putting client data into some cloud system I don't control." It's a fair concern. And until recently, the honest answer was: "You're right to be cautious, but that's the trade-off."
That trade-off is changing. Powerful AI models — the kind that can draft emails, summarise documents, answer complex questions, and help with analysis — now run entirely on an ordinary business laptop. No internet connection. No subscription. No data leaving your machine. And as of early 2026, the quality has crossed a threshold that makes this genuinely useful, not just a hobbyist curiosity.
What's Actually Changed
For the past few years, the most capable AI models — GPT-4, Claude, Gemini — were exclusively cloud-based. You'd send your text to a server, the model would process it, and you'd get a response back. That meant your data (and your clients' data) was always travelling somewhere. And it meant a subscription, usually $20–$30 per user per month, often per tool.
Open-source AI models have existed for years, but they were noticeably less capable than the proprietary frontier models. That gap is now closing fast. In February 2026, Alibaba released Qwen 3.5 — a family of models that includes versions capable of running on consumer hardware while matching the performance of leading cloud-based models like Claude Sonnet. Independent benchmarks back this up. The Qwen 3.5-Medium series, in particular, is designed specifically for real-world production use on standard hardware.
This follows a broader pattern. DeepSeek's R1 release earlier this year showed that efficient training methods could dramatically reduce the compute required to reach frontier performance. Qwen 3.5 is another step in the same direction: smaller models, smarter post-training, competitive results.
The Privacy Case for Running AI Locally
If your business handles anything sensitive — client financials, medical records, legal documents, personal information — the question of where data goes when you use AI tools isn't paranoia. It's due diligence. Most cloud AI providers have terms of service that allow them to use your inputs to improve their models (unless you're on a paid enterprise plan). Even when they don't, a data breach at a third-party provider is still your problem if it involves your clients' information.
Running a model locally changes the equation entirely. The AI runs on your hardware, processes your data, and nothing leaves your machine. There's no account to breach, no server to subpoena, no terms of service to misread. For industries with compliance obligations — healthcare, legal, financial services — this isn't just convenient, it may be necessary.
Local AI means the data never leaves the room. For some businesses, that alone is worth the small setup effort.
The Subscription Fatigue Problem
Beyond privacy, there's a simpler economic reality: AI subscription costs add up. A team of five people using ChatGPT Plus, Microsoft Copilot, and a specialist AI writing tool could easily be spending $300–$500 per month — and that's before any enterprise licensing. For a small business, that's real money.
Open-source models running locally cost nothing to run once set up. The compute cost is electricity and whatever hardware you already own. For many common business tasks — drafting, summarising, answering questions about your own documents — a well-run local model is genuinely comparable to what you'd get from a cloud subscription.
This doesn't mean local AI replaces everything. Some tasks (complex reasoning chains, real-time web search, image generation) still benefit from more powerful cloud infrastructure. But for the everyday AI use cases most SMBs actually need, local models are now a serious option.
How to Actually Get Started: Ollama
The biggest barrier to local AI has always been the technical setup. Running an AI model used to require comfort with command-line interfaces, Python environments, and GPU configuration. That's still one path — but it's no longer the only one.
Ollama is a free, open-source tool that makes running local AI models about as complicated as installing any other app. You download Ollama, pick a model from its library (including Qwen 3.5), and run it. It handles all the technical setup in the background.
- Download Ollama from ollama.com — available for Mac, Windows, and Linux
- Pull a model — open a terminal and run
ollama pull qwen2.5:7b(or whichever model fits your hardware) - Chat with it — run
ollama run qwen2.5:7band start typing - Add a UI — tools like Open WebUI give you a ChatGPT-style browser interface sitting on top of Ollama, no terminal required after setup
The whole process takes 20–30 minutes for someone comfortable with installing software. It's not zero friction — but it's closer to setting up a new app than configuring a server.
What Hardware Do You Need?
This is the honest part: not every business laptop will run every model well. Here's a practical guide:
- Modern MacBook (M1/M2/M3/M4 chip, 16GB+ RAM) — Excellent. Apple Silicon handles local AI particularly well. A 7B or 14B parameter model will run comfortably and quickly.
- Windows PC with a recent NVIDIA GPU (8GB+ VRAM) — Great performance. The GPU accelerates inference significantly.
- Standard Windows or Linux laptop (no dedicated GPU, 16GB RAM) — Works, but slower. Smaller models (7B parameters) are usable; larger ones will feel sluggish.
- Older hardware with 8GB RAM — Technically possible with very small models, but probably not worth the effort for business use.
The "medium" Qwen 3.5 models that match frontier performance require more RAM — typically 32GB or a capable GPU. For most SMBs, the 7B models are the practical sweet spot: genuinely useful, runs on modern hardware, and fast enough to feel responsive.
What Local AI Is (and Isn't) Good For
Local AI is a strong fit for tasks where you're working with your own data and don't need live internet information:
- Drafting and editing documents, emails, proposals
- Summarising meeting notes, reports, or client intake forms
- Answering questions about documents you feed it
- Generating first drafts from your own templates and examples
- Internal Q&A tools trained on your business knowledge
It's less suitable for tasks that require real-time information (current events, live pricing, today's news) or that benefit from the absolute frontier of reasoning capability. For those, cloud tools still have an edge. The good news: you don't have to choose one or the other. Many businesses end up using local AI for sensitive, everyday tasks and cloud AI for the occasional complex or time-sensitive request.
If you're looking for quick wins to get started with AI in your business, local AI adds a new option to that list — particularly for any workflow where you've hesitated to use cloud tools due to data concerns.
The Bigger Picture
A year ago, "run your own AI" was advice for developers and enthusiasts. Today, with models like Qwen 3.5 matching the quality of premium cloud subscriptions, it's a genuine business decision worth considering — especially for businesses in industries where data sensitivity matters, or where subscription costs are becoming hard to justify.
The technology is moving fast. Open-source models are getting better every few months, hardware is becoming more capable, and tools like Ollama are making the setup progressively more accessible. The barrier between "technically possible" and "practically useful" has already been crossed. The question now is whether your business is in a position to take advantage of it.
Getting started doesn't require committing to a full infrastructure change. Download Ollama, try a model, and see if it handles even one task you currently send to a cloud tool. If it does, you've already found value — and you've done it without a subscription, without sending data to a third party, and without anything leaving your office.