Split Your AI Tasks: Why Planner-Doer-Checker Works

The single biggest improvement you can make to your AI-assisted workflows costs nothing and requires no new tools: split every complex task into planning, doing, and checking. This structural change — treating these as three distinct steps rather than one prompt — produces more reliable output, catches errors before they reach clients or colleagues, and makes it easier to improve each stage independently over time.

Where This Idea Comes From

In April 2026, Anthropic published research on a three-agent harness designed to keep autonomous coding sessions productive across multi-hour runs. The architecture separates work into three distinct roles: a planner that decides what to do, a generator that produces the output, and an evaluator that verifies the result. Each agent operates in a fresh context window, preventing the compounding errors that occur when one session tries to do everything at once. Anthropic's research found that quality degrades measurably in single-context sessions lasting more than an hour — the three-agent design was built specifically to prevent this.

The coding context is technical, but the underlying insight applies to any knowledge work. Anthropic's guidance on building effective agents reinforces the same principle: complex tasks benefit from decomposition. The model doing the work shouldn't also be the model deciding what good looks like. Mixing roles is how you get confident-sounding output that's actually wrong.

Why Mixing Roles Produces Weak Output

When you give AI a single, complex prompt — "Write me a proposal for this client" — you're asking it to simultaneously figure out the goal, produce the content, and judge whether the result is good. That's a lot to hold in one pass. The model can get so focused on generating fluent prose that it glosses over important structural requirements. Or it confidently completes the task without flagging obvious gaps, because nothing in the prompt asked it to look for gaps.

This is the same problem Anthropic's research labels as context drift — too many competing objectives degrade each one. The fix isn't a better model. It's a better structure. Separating what you want from how to produce it from whether the result is good gives each stage the space to do one job well.

The goal isn't to use three prompts instead of one. It's to use three different types of thinking — and AI handles each of them better when they're kept separate.

The Framework in Plain Terms

You don't need agents or automation to use this. You can run the whole thing yourself with three separate prompts in any AI tool — ChatGPT, Claude, Gemini, Copilot. The structure is:

Planner: Defines the goal, constraints, format, and success criteria before any output is created.
Doer: Executes against the plan. Produces the draft, analysis, message, or document.
Checker: Reviews the output against the plan. Flags gaps, errors, or weak points — without rewriting.

The checker step is the one most people skip. It's also the one that catches the most problems. A model asked to check its own work in the same prompt will almost always tell you it looks fine. A model asked to check work against an explicit brief — especially work it didn't produce — will find real issues. The separation is what makes the check meaningful.

Four Worked Examples

Content Production

You need a LinkedIn post summarising a client result. Instead of prompting "Write a LinkedIn post about X", try three passes: (1) Planner: "Here's what happened with this client. What key points should a LinkedIn post cover — what's the hook, the outcome, and the call-to-action?" (2) Doer: "Write the post based on these points." (3) Checker: "Here's the brief and the draft. Does the post cover all three points? Is the hook strong? Does it read as genuine or promotional?"

The checker will catch things like a missing outcome, a weak opening line, or language that sounds like marketing copy rather than a real result. These are exactly the issues a quick read-through by the author often misses — you know what you meant to say, so your brain fills in what's actually absent.

Data Analysis

You have a spreadsheet and need insights for a management report. Single-prompt approach: "Analyse this data and tell me what's interesting." This produces a list of observations — some useful, some obvious, some wrong. Split approach: (1) Planner: "We need to understand sales performance by region this quarter. What specific questions should we answer, and what would a useful summary look like for a board audience?" (2) Doer: "Answer each question using the data." (3) Checker: "Here are the questions and the answers. Are any answers inconsistent with the data? Are there obvious follow-up questions missing?"

In our workshops, we've seen this structure turn a five-minute AI data dump into an analysis that actually holds up when a finance director asks hard questions. The planner step forces you to specify what 'useful' actually means before the model makes that judgment on your behalf — and that's usually where the real value sits.

Customer Communication

A customer complaint needs a response. Without structure, you get a polished, generic reply that doesn't acknowledge the real issue. With structure: (1) Planner: "Here's the complaint. What does the customer actually want resolved? What tone is appropriate? What should we commit to and what should we avoid committing to?" (2) Doer: "Write the response based on this." (3) Checker: "Does the response acknowledge the specific issue raised? Does it make any commitments we're not sure we can keep? Does the tone match the emotional register of the complaint?"

We often see businesses skip the checker step and send responses that technically answer the complaint but miss the emotional register entirely — leaving customers more frustrated, not less. For anything sensitive, legal, or going to an important client, this step alone is worth the extra two minutes.

Proposal Writing

For a services proposal, single-prompt generation usually produces something that reads well but is wrong for the specific client — generic benefits, approximate scope, misread priorities. The split approach forces alignment before writing begins. (1) Planner: "Based on this brief, what problems is this client trying to solve? What should each section cover? What are the three things that will win or lose this?" (2) Doer: "Write the proposal sections." (3) Checker: "Does the proposal address the three winning criteria? Is any section too generic? Where does it feel like a template rather than a response to this specific client?"

This is the difference between a proposal that looks professional and one that actually closes. The checker step is where AI earns its keep — most people can write a reasonable draft, but catching your own blind spots before submission is genuinely hard. AI can do that dispassionately when you give it the right reference point.

How to Start Using It Today

Begin with a single workflow that regularly produces output you're not quite satisfied with. Map the three stages explicitly: what does 'good' look like (planner), what's the actual task (doer), and how do you know it's done properly (checker). Run each as a separate prompt and notice what the checker surfaces.

Once the pattern is familiar, document the planner prompts for recurring tasks. Your "write a client update" planner can become a team template — same questions every time, shared across everyone doing that type of work. This is where the framework builds real leverage: not just better individual outputs, but consistent outputs across the team. When you're ready to automate structured workflows beyond manual prompting, our guide to delegating tasks to AI agents covers how to hand off these patterns once the manual version is working well.

The Bigger Principle

Planner-doer-checker isn't a prompt trick. It's a reflection of how good work actually happens — you decide what you want, you make it, and you verify it meets the brief. AI compresses the time each step takes, but it doesn't eliminate the need for the steps themselves. The businesses getting the most reliable results from AI aren't necessarily using better models — they're using better structure.

If your AI outputs feel inconsistent or just not quite right, the answer is rarely a different tool. It's almost always a missing stage. Add the planner. Add the checker. Run them separately. For a broader look at the failure modes this structure prevents, the AI workflow reliability guide covers what breaks down when structure is skipped — and how to build processes that hold up under real-world conditions.

Sources

This article is grounded in the following reporting and primary-source announcements.

InfoQ: Anthropic's Three-Agent Harness Splits Planning, Generation, and Evaluation for Multi-Hour Autonomous Coding