What I learned writing 80+ user stories per sprint on a US state government program

There is a way of writing software requirements at high volume that produces 99 percent defect resolution and roughly 35 percent less rework once the discipline takes hold. We have seen it on regulated digital transformation programs running 80-plus user stories per sprint, with stakeholders in one time zone, engineers in another, QA in a third, and a defect tracker as the single source of truth. The discipline travels.

We use it directly when scoping AI implementations for mid-market clients. This post is the short version of what survived contact with production AI work.

A user story is a contract, not a description

The first thing high-volume requirements work teaches you is that a user story is not a sentence. It is a contract.

As a [role], I want to [action], so that [outcome].

The reason this template exists is not to make stories sound consistent. It is to force three things into the open before a single line of code is written:

Who is the actor. Is it a caseworker, a parent, an admin, an automated system?
What is the action. Is it a one-time submission, a recurring review, an exception?
What is the outcome. What does success look like, in language the actor would use?

When any of those three is missing or fuzzy, the story is not ready. Not later, not in clarification, not in a follow-up Slack thread. Now. The cost of writing a story without all three is rework after sprint three when QA discovers the actor was actually two different roles.

This habit applies cleanly to AI work. When a mid-market COO says, "I want our team to spend less time on data entry," that is not a user story. It is a wish. The story might be:

As a customer service rep, I want incoming inquiry emails to be automatically tagged with category, urgency, and customer history, so that I can triage in 30 seconds instead of three minutes.

That is testable. That is buildable. That has acceptance criteria.

Acceptance criteria do the work the story cannot

A user story tells you the shape of the feature. Acceptance criteria tell you when it is done.

The format we use is:

Given [precondition], when [action], then [observable outcome].

Three or four of these per story. No more. If you need ten acceptance criteria, the story is too big and needs to be split.

The discipline is in the word "observable." Every criterion has to be something a tester can demonstrate without asking a developer how the code works. Internal state changes do not count. Database flag flips do not count. The criterion is what the user, or an integrating system, can see.

For AI work this is even more important than for traditional software. AI features fail in subtle, probabilistic ways. If your acceptance criteria are vague ("the AI should give good answers"), you have no defense against drift. If they are observable ("Given a CSV of 5000 sales rows, when a user asks 'show revenue by region for Q3', then the response includes a bar chart with regions on the x-axis and revenue on the y-axis, filtered to July through September of the current fiscal year"), you can test the AI like any other system.

AS-IS before TO-BE, every time

Every feature should start with two diagrams. One is the AS-IS process, mapped from interviewing the actual person who does the job today. The other is the TO-BE process, the redesigned flow once the new system is in place.

You cannot skip the AS-IS step. We have watched teams skip it and pay for it later, every time.

The reason is that operational reality is always more complicated than anyone admits in a meeting. The person who has been doing the job for ten years has nine workarounds for things the official process does not handle. If you design the new system based on the official process, you are designing for a workflow that does not exist.

When we run an AI Operations Audit, the first deliverable is the AS-IS process for one operational area. Not a slide. A diagram and a written narrative, accurate enough that the person doing the work today can read it and say "yes, that is what I do." Only then do we talk about which steps an AI implementation should change.

The audit fee is partly for the audit itself. It is also partly for the discipline of forcing the AS-IS to exist, because mid-market companies almost never have one written down.

The "why" line that changes everything

Regulated transformation work is requirements-heavy. You are writing for stakeholders who will sign in blood that the system does what the document says it does. You are also writing for engineers who, six months later, will need to know why a feature exists when they are tempted to refactor it away.

The single most useful sentence we add to every user story is a "rationale" or "why" line.

Why: This rule exists because [statute reference / policy decision / observed user behavior]. If this rule is changed, [downstream consequence].

That sentence does three things:

It documents the reason in the same place as the requirement, so future-you does not have to dig through Confluence to find it.
It exposes weak rationale. If you cannot write a sensible "why" line, the requirement may not actually be needed.
It surfaces dependencies. If the why is "because Section 5103.1.b mandates ten-day notice," then a future developer cannot quietly change the notice window without realizing they are creating a compliance issue.

For AI implementations, the why line is even more important because the AI behavior often depends on prompt engineering that looks arbitrary in code. If a prompt instructs Claude to "respond in 130 words or less," and the why line says "because the response is rendered inside a chat bubble with a max height of 240px on mobile," then a developer who later changes the limit knows what UI testing they need to do.

Defect triage is a writing exercise

Toward the end of every sprint, defects get triaged. The job is deceptively simple: read the defect, decide if it is a real bug, a misunderstood requirement, or out of scope, and assign next steps.

Here is what we learned doing this at high volume:

A defect that is hard to triage is almost always a defect in the original requirement, not a defect in the code.

If the tester and the developer cannot agree on whether the system is behaving correctly, the requirement was ambiguous. The triage conversation is forensic evidence of where the requirements failed.

We keep a defect log on every AI engagement. Not because we expect a lot of bugs in a four-week sprint, but because the bugs we do see tell us exactly where the requirements were too thin. The next engagement gets sharper requirements in those areas.

What this means for AI implementations in 2026

The pattern in mid-market AI implementations right now is the opposite of what well-scoped requirements work demands. Vendors arrive with a tool and try to retrofit a process around it. Strategy decks describe outcomes without specifying acceptance criteria. Buyers sign engagements with no AS-IS process documented.

When the result is "the AI is not working," nobody can prove the assertion either way, because there is no contract to test against.

The discipline that produces 99 percent defect resolution at high volume is the same discipline that produces a working AI implementation in a mid-market firm. Roles, actions, outcomes. Observable acceptance criteria. AS-IS before TO-BE. The why line. Defect triage as a writing exercise.

That is the senior BA work. The AI implementation is what you do once the requirements are right.

If you are a mid-market operations leader looking at AI vendors and feeling like the proposals are vague, that is not your imagination. The proposals are vague. You can either work with someone who fills in the BA layer, or you can hire one yourself. The AI Operations Audit is the productized version of that BA layer for two weeks and a fixed price.

What 80+ user stories per sprint taught us about scoping AI work

A user story is a contract, not a description

Acceptance criteria do the work the story cannot

AS-IS before TO-BE, every time

The "why" line that changes everything

Defect triage is a writing exercise

What this means for AI implementations in 2026

Ready to implement this for your business?

Sultan Siddiqui

Continue Reading

The audit-sprint-retainer model: how to structure AI consulting engagements

SEO in 2026: How AI Is Changing Search and What You Need to Do

Data Privacy and AI: What Every Business Owner Needs to Know in 2026

If you scrolled this far,
you have a question.

What 80+ user stories per sprint taught us about scoping AI work

A user story is a contract, not a description

Acceptance criteria do the work the story cannot

AS-IS before TO-BE, every time

The "why" line that changes everything

Defect triage is a writing exercise

What this means for AI implementations in 2026

Ready to implement this for your business?

Sultan Siddiqui

Continue Reading

The audit-sprint-retainer model: how to structure AI consulting engagements

SEO in 2026: How AI Is Changing Search and What You Need to Do

Data Privacy and AI: What Every Business Owner Needs to Know in 2026

If you scrolled this far,you have a question.

If you scrolled this far,
you have a question.