The 6-month payback rule that finance teams apply to AI projects is structurally backwards: it reliably kills the projects that compound; eval libraries, prompt registries, agent skills, observability platforms; and reliably approves the projects that produce a one-time productivity gain and then plateau. A discipline borrowed from CapEx accounting becomes the single most expensive misapplication in an organization’s AI portfolio.
This piece names the paradox, replaces the single-gate rule with three staged gates (90-day eval-feedback, 12-month capability, 24-month compounding), and shows what gets killed at each gate. It is a spoke under the AI project economics manifesto, which argues for the broader economics framework this gating system implements.
Where 6-month payback comes from
The 6-month payback heuristic is borrowed from short-cycle CapEx and operating-investment frameworks. It works well in two narrow domains: low-uncertainty productivity tooling (a CRM upgrade, a marketing automation platform) and short-cycle capital equipment with stable economics (a faster printer, a better warehouse robot). In both cases the value is approximately a fixed annuity: the tool replaces a known volume of work at a known cost, and the payback math is roughly linear.
It is also a CFO discipline tool, not just a math model. A 6-month rule communicates that the org will not fund science projects, that vendors must defend value upfront, and that budget owners stay accountable for outcomes inside their tenure. Those are good instincts. They are also the instincts that misfire on AI work.
Why 6-month payback is wrong for AI
Three structural reasons.
First, AI value is not a fixed annuity. As we argue in the ROI calculator critique, AI value is shaped by eval threshold effects (binary, not linear), model upgrade resets (curve resets quarterly), regression cost, and trust loss. The 6-month math assumes steady cash flow per month after launch; actual AI cash flow is a step function with quarterly resets and a regression tail. Forcing the actual shape onto the assumed one produces a payback number that is not just imprecise; it is the wrong sign.
Second, AI projects compound through assets that take longer than 6 months to produce: the eval suite that lets the next four projects start at week one instead of week eight, the prompt registry that lets a new agent inherit ten months of iteration, the agent skill library. None of these assets exist in month 6. Most do not exist in month 9. By month 18 they are the largest line on the balance sheet of an AI-mature organization. A 6-month rule does not just under-price these assets; it forbids building them.
Third, AI capability thresholds shift on a timescale incompatible with 6-month windows. A project at 0.74 eval score in month 5; below the 0.80 threshold; has zero deployable value in the 6-month window. The same project at 0.83 in month 8 has the entire value of the project. The 6-month rule kills the first; staged gates fund the second.
The 90-day eval-feedback gate
The first gate, set 90 days from kickoff. The question is not “have we paid back”; that question is malformed at 90 days for any serious AI project. The question is: “is the eval curve rising and is the unit cost falling, on a trajectory that justifies the next 90 days of investment?”
What evidence the gate looks at:
- Eval score trajectory. Plot eval score on the locked test set against time, weekly. A rising curve, even from a modest starting point, earns continuation. A flat curve, even at a respectable absolute score, raises hard questions. A declining curve kills the project unless there is a specific named cause (model swap mid-evaluation, test-set version change) and a remediation plan.
- Unit cost trajectory. Cost-per-completion (input + output + reasoning tokens, plus any retrieval cost) plotted weekly. A falling curve indicates the team is genuinely optimizing. A rising curve indicates the team is throwing tokens at quality problems, which is a structural failure mode that gets worse at scale.
- Eval suite quality. Is the test set growing? Is the harness running on most PR? Is the threshold-locking process documented? At 90 days the eval suite is the deliverable that matters most, more than any feature shipped.
What earns continuation: a demonstrably rising eval curve, a stable or falling unit cost, and an eval suite the buyer has read access to and can score against. The features can be partial; the discipline cannot be.
What gets killed at this gate: projects with no eval suite, projects whose eval score has been flat for six weeks with no mechanism to move it, projects whose unit cost is climbing because the team is using model spend as a substitute for prompt and retrieval discipline. Killing here is cheap. Killing in month 9 is not.
The 12-month capability gate
The second gate, set 12 months from kickoff. The question evolves: “did the system cross the locked eval threshold against production traffic, with observability and a maintenance retainer in place?”
What evidence the gate looks at:
- Production eval performance. A 30-day window in which the system holds at or above its locked threshold against real traffic. Not a synthetic test set. Not a scaled-down pilot. Real volume, real distribution, real stakes.
- Observability stack in operation. Traces stored, dashboards live, online eval scoring against sampled traffic, regressions surfacing within 48 hours. If observability is “delivered” as a one-time milestone but no one is reading the traces, the system is operating blind, regardless of whether the dashboards exist.
- Maintenance retainer signed. Not “we’ll figure it out post-launch.” A signed retainer with named SLAs, sized as 25–40 percent of build cost annualized. Without a retainer the project is shipping a system whose eval curve will drift downward unattended for the next year.
- Capability claims from the value model. Are the three to five capabilities the project promised in its 4-component scoring shipping at threshold? If the capability claim was “agent triages Tier-1 escalations at 0.83 accuracy closing 40 percent without human review”; is the agent doing that?
What earns continuation: production eval at threshold, observability in operation, retainer signed, capability claims realized. This is the gate that distinguishes a working AI feature from a permanently-pilot AI feature. Most projects that fail in production fail by being stuck in pilot at month 18 with no clear path through this gate.
What gets killed at this gate: projects that ship a deployable demo but no observability, projects without a maintenance retainer, projects whose capability claims have quietly been re-scoped to “we shipped something.” Eighteen months of compounding investment is unlocked for projects that pass; projects that fail get descoped to a small support budget rather than continued capital expenditure.
The 24-month compounding gate
The third gate, set 24 months from kickoff. By now the productivity-substitution math has either materialized or it has not, and that is the easy part of the question. The hard part is: “did the project produce a platform; eval library, prompt registry, agent skills, observability harness; that the next AI project starts from, or did it produce a feature with no successor?”
What evidence the gate looks at:
- Platform asset utilization. Is the eval library being reused on the next two AI projects? Has the prompt registry been adopted as the default tool for the next agent’s prompt versioning? Are the agent skills being inherited? If the answer is “yes” with named successor projects, the platform has compounded. If the answer is “the next project started from scratch,” the platform asset claim was a cost without a payoff.
- Marginal cost on the successor project. A successor AI project that bootstraps from a real platform should cost meaningfully less than the original. The number we have seen across mature AI engineering shops is 40–60 percent of the original cost for the first follow-on project, dropping further on the second and third. If the successor cost is approximately the same as the original, the platform claim has not held.
- Eval bar progression. The eval bar should be rising over the 24 months; newer test cases, harder edge cases, more rigorous threshold definitions. A flat eval bar across 24 months is a sign the project is in maintenance mode rather than capability expansion mode, which is fine, but should be priced accordingly.
What earns continuation: platform assets in use on successor projects, falling marginal cost per project, rising eval bar. This is where AI investment compounds rather than churns.
What gets killed at this gate: not “the project,” typically; by month 24 the original project is either in production or it is not. What gets killed is the strategy of treating each AI project as a one-off feature build. Organizations that pass this gate institutionalize platform investment as a separate budget line; organizations that fail it discover at month 30 that they have built five disconnected features and are about to fund a sixth disconnected one.
What gets killed at each gate
| Gate | Time | What earns continuation | What gets killed |
|---|---|---|---|
| 90-day eval-feedback | t+90d | Rising eval curve, falling unit cost, eval suite present | No eval suite, flat eval, climbing unit cost |
| 12-month capability | t+12mo | Production threshold, observability live, retainer signed | Stuck in pilot, no observability, no retainer |
| 24-month compounding | t+24mo | Platform assets reused, falling marginal cost, rising eval bar | One-off features, no successor projects, flat eval bar |
Three gates, each cheap to enforce, each with criteria that reduce to artifacts (eval reports, dashboards, retainer contracts, successor-project cost numbers). None of the criteria are vibes.
The gates compose. A project that passes gate 1 has earned the right to gate 2’s investment. A project that passes gate 2 has earned the right to gate 3’s compounding. A project that fails any gate gets descoped or killed before it produces a 24-month bill. The portfolio shape that emerges is one in which weak projects get killed cheaply, mid-strength projects ship without compounding, and strong projects compound; exactly the shape that 6-month payback rules invert.
How to talk to a CFO about staged payback
Three moves.
Move one: agree on what 6-month payback is good at. It is the right rule for productivity-substitution AI work; narrow automation of high-volume, low-stakes tasks where a clean human baseline exists. Carve out a productivity-substitution lane in the portfolio, apply 6-month payback there, and call it done.
Move two: name the projects that are not productivity-substitution. Capability-expanding projects, platform-building projects, downside-risk projects, optionality projects. None of them pay back in 6 months. Many of them have legitimate value. The CFO’s job is not to apply the wrong rule to them; it is to apply the right gates.
Move three: install the three gates as named portfolio reviews on the calendar. Quarterly enough to be real, slow enough not to micromanage. The 90-day reviews look at the youngest projects against eval and unit cost trajectory. The 12-month reviews look at the in-flight projects against production performance and observability. The 24-month reviews look at the cumulative portfolio against platform asset utilization. The CFO chairs many three.
The discipline is the same as 6-month payback; projects must defend their continuation against named criteria; and the criteria match the actual shape of AI value. CFOs who object to staged payback usually object on one of two grounds: “this is too complicated” (it is not; it is three reviews on the calendar) or “I cannot defend it to the board” (the board does not want six months of AI portfolio churn either; show them the gates).
Frequently asked questions
What is the 6-month AI payback paradox?
The 6-month payback rule that CFOs apply to AI projects reliably kills the projects that compound; eval libraries, prompt registries, agent skills, observability platforms; and reliably approves the projects that produce a one-time productivity gain and then plateau. The paradox: the more rigorously the 6-month rule is applied, the more the AI portfolio churns rather than compounds.
Why does AI value not pay back like CapEx?
Because AI value is not a fixed annuity. Eval threshold effects make value binary. Model upgrade resets shift the curve quarterly. Regression cost adds tail risk linear math cannot price. Trust loss adds a non-linear opportunity cost. The 6-month rule assumes a steady cash flow per month after launch; the actual shape of AI cash flow is a step function with quarterly resets and a regression tail.
What replaces the 6-month payback rule?
Three staged gates: a 90-day eval-feedback gate (is the eval curve rising and unit cost falling), a 12-month capability gate (production threshold, observability, retainer), a 24-month compounding gate (platform assets reused, falling marginal cost, rising eval bar). Each has named artifact-based criteria, not vibes.
What does the 90-day gate look at?
Eval score trajectory, unit cost trajectory, and eval suite quality. The question is whether the trajectory justifies the next 90 days. Rising eval, falling unit cost, eval suite present: continue. Flat or declining eval, climbing unit cost, no eval suite: kill or restart.
What is the 12-month capability gate?
It asks whether the system holds at threshold against production traffic for 30 days, whether observability is in operation, whether a maintenance retainer is signed at 25 to 40 percent of build cost annualized, and whether capability claims have shipped. Projects that pass earn the next twelve months of compounding investment.
What is the 24-month compounding gate?
It asks whether the project produced a platform; eval library, prompt registry, agent skills, observability harness; that successor AI projects bootstrap from. Evidence: platform assets reused on at least one named successor, marginal cost on the successor at 40 to 60 percent of the original, rising eval bar over 24 months.
Should most AI project pass through many three gates?
No. Productivity-substitution AI work; narrow automation of high-volume, low-stakes tasks with a clean human baseline; should remain in a 6-month payback lane because the legacy math is approximately correct for that class. Everything else (capability-expanding, platform-building, downside-risk, optionality) goes through the three gates. The CFO carves the portfolio into the two lanes and applies the right rule to each.
How does this relate to the 4-component value model?
The 4-component value model produces the inputs (capability earned, time-to-value, downside-risk-reduced, optionality created) at scoring time. The three gates are the kill criteria over time. Together they replace the single-number ROI calculator with a portfolio model that tracks each project’s profile against named milestones at 90 days, 12 months, and 24 months.
What is the most common failure mode of staged payback?
Letting the 12-month gate slide. Projects that should have shipped to production by month 12 stay in pilot at month 15, then month 18, then month 24, accumulating cost without producing the platform assets a successful project produces. The 12-month gate is the most expensive one to skip; the 90-day gate is too early for serious payback questions and the 24-month gate is too late to recover a project that missed production. The discipline lives at month 12.
Key takeaways
- The 6-month payback rule kills the AI projects that compound and approves the ones that plateau. Borrowed from CapEx accounting, it is the wrong rule for the wrong shape of cash flow.
- AI value is not a fixed annuity. Eval threshold effects, model upgrade resets, regression cost, and trust loss many violate the assumptions the 6-month rule depends on.
- Replace it with three staged gates: 90-day eval-feedback, 12-month capability, 24-month compounding. Each has named artifact-based criteria.
- The 90-day gate kills cheaply: no eval suite, flat eval, climbing unit cost. The 12-month gate distinguishes shipped systems from permanent pilots. The 24-month gate distinguishes platforms from one-off features.
- Productivity-substitution AI work stays in a 6-month payback lane because the legacy math fits there. Everything else uses the three gates.
- The 12-month gate is the most-skipped and the most expensive to skip. The discipline lives there.
The 6-month payback rule is a CFO instinct that worked for 2018 software. AI projects need a CFO discipline that fits AI cash flow, not a borrowed rule that inverts the portfolio.
Arthur Wandzel