As of March 2026, Microsoft 365 Copilot can call on two different AI models depending on the task: OpenAI's GPT and Anthropic's Claude. Wave 3 of Copilot introduced multi-model support, quietly shifting M365 from a single-model assistant into a platform where model selection is now a workflow decision. Most users haven't noticed. Most businesses haven't changed how they use it. That's a missed opportunity — because the two models are genuinely good at different things, and routing the wrong task to the wrong model is how you get outputs that sound confident but aren't.
What Actually Changed in Wave 3
Microsoft's March 2026 update did three things worth knowing about. First, it made Claude available inside Copilot — not as a separate app, but as part of the core experience alongside GPT. Second, it introduced a feature called Critique, where GPT drafts an answer and Claude reviews it for accuracy before you see the result. Third, it launched Copilot Cowork — a capability that enables multi-step autonomous task execution for minutes or hours, pairing Claude's reasoning model with M365's file access for longer-horizon work.
This is a meaningful architecture shift. In the old model, Copilot was essentially GPT with access to your files. In the new model, you have a reasoning-focused model (Claude) and a generation-focused model (GPT) operating in the same suite. Knowing which to lean on — and when — is now a workflow skill, not a setup choice.
What Claude Brings to M365
Claude's strengths in M365 centre on tasks that require careful reasoning, document comprehension, and structured analysis. These include:
- Long document review — contracts, reports, policy documents. Claude handles dense text well and tends to surface nuance rather than skim for keywords.
- Analytical interpretation — when you need Copilot to explain why something in a spreadsheet looks the way it does, or trace a pattern across multiple files, Claude's reasoning model is more reliable.
- Accuracy-sensitive drafts — financial summaries, compliance-relevant communications, technical documentation. Situations where factual precision matters more than stylistic polish.
- The Critique review layer — in the new dual-model pipeline, Claude acts as the reviewer, checking GPT's output for accuracy before surfacing the result to you.
Claude also underpins Cowork for long-running tasks — situations where Copilot needs to reason through a multi-step process and carry it to completion without constant prompting. According to Microsoft's March 2026 Wave 3 announcement, Cowork pairs the M365 suite with Anthropic's agentic model specifically because complex, multi-file reasoning tasks demand a model built for deliberate analysis rather than fast generation.
What GPT Brings to M365
GPT remains the better fit for generation-heavy tasks — situations where you need fluent, fast output rather than deliberate analysis:
- Email drafting — GPT is faster and more natural for producing polished first drafts from bullet points or voice notes.
- Slide and document generation — turning a brief into a formatted presentation or structured document. GPT handles stylistic cohesion well.
- Creative ideation — brainstorming, naming, rephrasing, tone shifting. GPT's fluency is the right tool here.
- Quick lookups and summaries — when you need a fast synthesis rather than a careful one, GPT is adequate and responsive.
The practical split: reach for GPT when you're producing something, reach for Claude when you're analysing or reviewing something. That's not a rigid rule, but it's a useful starting heuristic for most business workflows.
How the Critique Feature Changes Accuracy-Sensitive Work
The Critique feature deserves a closer look because it changes how Copilot handles communications where errors have consequences. According to coverage of the Wave 3 launch, Critique combines both models in a single pipeline — GPT generates the draft, Claude reviews it for accuracy before you see it. You're not choosing between models here; the system routes appropriately.
In practical terms, this matters most for external communications — a client proposal, a pricing confirmation, a compliance-sensitive message. The dual-model check doesn't eliminate errors, but it catches more of them before they reach your inbox. For routine internal emails, the extra review layer adds latency without much benefit. Knowing it exists means you can recognise when Copilot is applying it versus when it isn't, and prompt accordingly.
When Model Selection Becomes a Real Skill
In our workshops, we've found that most teams use Copilot at roughly 40–50% of its potential because they treat the tool as a single instrument rather than a toolkit. The multi-model architecture makes this gap wider. Teams that understand what each model is good at get materially better outputs — not because they've changed their workflow, but because they've changed which capability they engage with for a given task.
A common scenario we see: a finance analyst uses Copilot to summarise a quarterly report. GPT produces a clean-sounding summary that mischaracterises a key variance. If they'd routed that document through Claude for analysis, they'd have gotten a less fluent but more precise reading. The cost of the error is invisible until it surfaces in a meeting. This is the kind of thing no one catches because the output looks fine.
The skill here isn't technical — it's conceptual. Understanding that "use Copilot" is no longer a complete instruction. Which mode, which model, which task type. It's the same mental model that experienced users bring to the broader AI tool landscape — see our guide on right-sizing AI models for SMB workflows for the wider framing.
A Practical Decision Framework
Here's how to think about model choice in M365 Copilot until the interface makes this more explicit:
- Writing something from scratch? Default to GPT. Email drafts, slide content, document outlines.
- Reviewing or interpreting existing content? Route to Claude. Contract review, financial analysis, document comparison.
- Running a multi-step task that takes time? Copilot Cowork uses Claude automatically — this is the right tool for complex research, cross-file reasoning, or workflow automation.
- Sending accuracy-sensitive external communications? The Critique feature engages both models in sequence — let it run.
- Not sure? Use Claude as the default for anything analytical. Its conservative output is less likely to confidently state something wrong.
This framework will evolve as Microsoft surfaces model controls more explicitly in the UI. For now, knowing which features use which model — and what that means for your output — is the practical edge.
What This Means for Your M365 Rollout
If you're currently rolling out Copilot across a team, Wave 3 changes the training conversation. It's no longer enough to teach people to use Copilot — you now need to teach them to think about task type before they prompt. That's a short training lift but an important one. Teams that don't get this briefing will under-use the accuracy improvements and over-rely on generation output in situations where analysis was what they needed.
For businesses already running Copilot, the multi-model architecture is worth revisiting in your next team check-in. Ask which use cases your team is routing to Copilot and whether the model handling those tasks is the right fit. Most of the time, it won't require any configuration change — just an updated mental model. Our earlier post on Copilot Cowork for complex business workflows covers the agentic side of this update in more depth if you want to go further on autonomous task execution.
The broader shift here is that enterprise AI suites are becoming multi-model environments, and model literacy — knowing what each model does well — is becoming a baseline business skill. That's not a technical concern. It's a people and training concern. And it's already here.
Sources
This article is grounded in the following reporting and primary-source announcements.