The 4-Question Test: Is Your Task Actually Ready for AI?
A pattern keeps showing up in small construction firms. The owner buys an AI tool to draft quote variations — his three biggest clients each want quotes in their own template, different headers, different ordering of line items, different breakdown of materials versus labour. Friday afternoons go from three hours of reformatting to twenty minutes of reviewing. The first few weeks feel like the cleanest decision of the year.
Then a quote goes out with one line item the wrong way round. Materials and labour swapped. The total looks plausible. The client signs it. Three weeks later, the firm discovers the swap on the second invoice and realises it’s £620 underwater on the job.
The tool comes off the workflow. Not because AI doesn’t work for quote variations — it does. The firm stopped because the decision to deploy it skipped two of the four questions worth asking before any task gets handed to AI.
This piece is about those four questions.
The 4-Question Test
The test is the conversation worth having before any AI tool gets bought. Four questions, run on the actual task, not on AI in general. Pass all four and AI fits cleanly. Fail one, and either fix that gap first or accept this isn’t an AI job.
Each question filters something different. Most owners ask the first one and stop. The second one is overlooked. The third quietly kills more deployments than any other. The fourth turns small mistakes into real bills.
The four:
- Repeatable? Does the same task come back in roughly the same shape, week after week?
- Showable? Could you hand over the templates, past examples and rules to someone tomorrow?
- Checkable? Could a competent person spot a wrong output in under a minute?
- Recoverable? If one output is wrong, can you fix it without major damage?
What follows is each question, what it actually filters, where it fails most often, and what to do when it does.
Question 1 — Repeatable?
The easy one, and the one most owners do think about. Does this task come back in roughly the same shape, week after week?
A tenancy renewal letter is repeatable. The wording shifts slightly tenant to tenant; the structure doesn’t. RAMS documents are repeatable — 90% of every site’s safety plan is the previous site’s, with the location header swapped. Booking confirmations are repeatable. Site reports against a client template are repeatable. AML chase emails are repeatable.
A tricky landlord dispute reply is not repeatable. A bespoke proposal for an architectural client building something the studio has never built before is not repeatable. A first conversation with a difficult guest is not repeatable.
This question disqualifies a task quickly when it fails. When it passes, it tells you nothing about whether AI will actually deliver — which is the bit owners miss. Repeatability is necessary, not sufficient.
Question 2 — Showable?
The hidden disqualifier. Could you hand over the templates, past examples and rules to someone tomorrow morning?
This is the question that separates a task an AI tool can do from one it can’t, even when both look identical from the outside. AI doesn’t have intuition about how the firm does things. It needs the context spelled out — the past quotes, the standard clauses, the tone of voice, the rules a senior person applies without thinking. If those things exist in writing somewhere, they can be handed over. If they sit in one person’s head, they can’t. (We unpacked why this matters at the model level in What AI Actually Does — context is the lever, not the model.)
A booking confirmation is showable. Show the model the last twenty confirmations and the rules — earliest check-in time, what to do about pets, the deposit line, the cancellation policy — and the next one writes itself.
Pricing strategy for a difficult new instruction is not showable. The senior negotiator reading the market and balancing the agency’s reputation against the landlord’s expectations is doing something that has never been written down, because it changes case by case. The context isn’t transferable because the context doesn’t exist as a document.
When this question fails, the answer is not to buy AI. The answer, if the work is genuinely worth handing off later, is to start writing down what the senior person knows. That’s a six-week documentation project. It is also the reason most “let’s roll out AI” initiatives stall at week three.
Question 3 — Checkable?
The slow killer. Could a competent person spot a wrong output in under a minute?
This is where the construction firm’s quote variations failed. The output looked right. It read right. A senior eye reviewing it for thirty seconds — which is the realistic review window any busy owner gives a draft — could not have caught the swapped line item. It would have taken a careful read against the original source quote to spot the error, which is precisely the work the AI was supposed to remove.
Checkability is a property of the output, not the model. A draft renewal letter is checkable in under a minute — read the rent figure, read the dates, read the tone, done. A site report against a client template is checkable — scan for the four mandatory sections, glance the dates, done. A booking confirmation is checkable in fifteen seconds.
A long contract review where AI has extracted ten clauses needing attention is not checkable in under a minute — verifying it found all the clauses requires reading the contract anyway. A complex quote with thirty line items is borderline at best. Anything that requires the reviewer to do the original work to verify the AI’s work is failing this question.
The right response when this question fails is to either narrow the task until it becomes checkable, or to keep it human. Draft the renewal letter passes. Reply to anything tricky in the inbox doesn’t — because the reviewer has to read each tricky email properly to know whether the AI handled it sensibly.
Question 4 — Recoverable?
The catastrophic one. If one output is wrong and slips through, can you fix it without major damage?
Most AI tasks are recoverable. A wrong renewal letter goes out, the tenant rings up confused, the office corrects it, the relationship is fine. A wrong booking confirmation gets caught in the morning sweep, an apology email goes out, the guest checks in anyway. A wrong site report draft gets caught at internal review and revised before the client sees it.
Some are not. A quote signed by a client at the wrong number locks the firm into the wrong number. A Section 21 served with the wrong date is invalid and the eviction process restarts. An AML check missed creates a regulatory problem nobody wants. A safety document with the wrong site address is a real liability if anything goes wrong on the day.
This question doesn’t usually disqualify a task entirely — most tasks can pass Q4 if the workflow is set up so a human signs off on the last step. The work the question does is to surface where that sign-off needs to be added, before the workflow goes live. Skipping Q4 is how the £620 quote happened.
The pattern the four questions catch
Most rollouts that fail in small businesses fail one of these four — usually Q2 or Q3 — and the team mistakes it for a tool problem.
The shape of the misread is the same across sectors. A construction firm tries AI on quote variations, the output slips through with errors, the conclusion is the tool isn’t good enough. A letting agency tries AI on a tricky landlord email, the output is generic, the conclusion is AI isn’t useful for us. A boutique hotel tries AI on a creative menu description, the output is bland, the conclusion is we tried it, didn’t work. The same misread crops up in cleaning and facilities firms too — different vocabulary, same wrong conclusion.
In each case, the tool isn’t the problem. The construction firm’s quote task failed Q3 — output looks right but isn’t quickly checkable. The letting agency’s task failed Q1 and Q2 — unique, judgement-led, no context to show. The hotel’s task failed Q2 — the firm’s voice and the dish’s intent sit in the chef’s head, not in writing.
Once the questions are run honestly, the answer in each case is different. The construction firm needs a workflow change: a short reconciliation step against the original quote before sending. The letting agency needs to leave that task with the negotiator. The hotel needs the chef to spend a day writing down what makes a dish description work for them — and then AI fits.
What the test is actually for
The four questions don’t replace judgement about a task. They make the judgement structured.
Most owners trying AI for the first time pick a tool, point it at a task, and form a verdict based on one early output. The test inverts the order. Pick a task, ask the four questions, and the answer about whether to buy any tool at all becomes almost automatic. If all four pass, there’s a tool worth trialling. If one fails, fix that gap first — sometimes the gap is the work — and the tool conversation happens later.
The compound effect across a business is the bit owners miss. Apply the test to twenty tasks across the week and a clear pattern usually emerges. Three or four pass cleanly. A handful fail on Q2 — meaning the firm has knowledge in heads that should be in writing anyway, AI or not. A few fail on Q4 — meaning sign-off workflows need designing. Most of the rest stay human, which was always the right answer, and the team’s time goes back to the work that needs them.
That twenty-task sweep is exactly the work that turns into a real Admin Tax number — the £8,000–£20,000 per person per year most small businesses don’t realise they’re paying. If you’d rather sort your own list with someone, that’s exactly what an AI Strategy & Operations Audit is built around — running the four questions across your actual week and separating the AI jobs from the process jobs from the decisions that have been avoided. For the line items that pass cleanly, a Bespoke AI & Automation Build is what gets them off your team’s plate properly.
The questions are doing the work either way. Run them on three tasks this week before any tool gets bought. If you’d like a first-pass number on what those tasks are actually costing you, the free AI Value Calculator gives you an instant estimate of hours reclaimed and annual savings.
Want help running the four questions on your own tasks?
Book a free consultation. We’ll take three tasks from last week, run them through the test, and tell you plainly which ones are AI-ready, which need a process fix first, and which should stay human — no jargon, no commitment.
Book Free Consultation