Most AI project board memos are written in the language of 2018 software: features shipped, milestones hit, story points burned, sentiment from the team. None of that language belongs in a 2026 board memo about AI work, because none of it answers the questions a fiduciary has to answer about an AI investment. A board memo for an AI project should be readable in seven minutes by a director with no engineering background, and it should make the kill-or-fund decision mechanical against named criteria. This piece is the template; sections, prompts, and a worked example; for a memo that does that.
It is a spoke of the AI project economics manifesto, which establishes evaluation as the unit of account. The board memo is the artifact that surfaces the eval data, the unit cost trajectory, and the staged-payback gate evidence to the people who approve the next 90 days of spend.
What an AI board memo has to do that a software board memo does not
A traditional software board memo answers: are we building the right thing, are we shipping it on time, are we spending the budget we said we would. Three questions, many answerable from a Jira board and a finance dashboard.
An AI board memo has to answer four additional questions that Jira and finance cannot reach:
-
Is the system working? “Working” in AI is not “deployed” or “demoed.” It is a measured eval score against a named test set at or above a locked threshold, holding across recent traffic. A demo is not evidence; an eval report is.
-
Is the unit cost going the right way? AI systems have a real per-completion cost that is co-determined with quality. A rising eval score on a falling unit cost is a healthy trajectory. A rising eval score on a doubling unit cost is a flag.
-
Did the last frontier model upgrade move the curve? Three to five times a year a major model release forces a re-evaluation. The board needs to know what happened in the last cycle and what is expected in the next. The cadence is built into the work; pretending it is not is the most common board reporting failure.
-
Is the project earning the next gate? Staged payback (90-day, 12-month, 24-month gates) replaces single-number ROI for AI work. The board memo has to surface where the project is in the gate sequence and what the kill or compound criterion at the next gate looks like.
A memo that does not answer those four questions is a 2018 memo dressed in 2026 vocabulary. The template below answers many four, in order, with no jargon the board has to ask about.
The 9-section template
A complete AI project board memo is nine sections. Total target length: 1,800 to 2,500 words, readable in seven minutes.
-
Headline finding. One sentence: kill, restart, fund-as-planned, or fund-with-expansion. The rest of the memo defends this sentence.
-
Eval scorecard. A table of the named eval sets, their versions, their current weighted score, the locked threshold, the delta since last cycle. Three to seven rows.
-
Unit cost trajectory. A small chart and a table: cost-per-completion (or cost-per-action, or cost-per-resolved-ticket; whichever is the canonical unit) over the last four reporting periods, with the projection for the next two.
-
Last model-upgrade cycle. What changed (Claude 4.7 to 4.8, GPT-5 to 5.1, etc.), what regressions appeared, what the triage cost was, what the new locked threshold is.
-
Risks and the regression tail. The two or three live risks: regression rate, vendor deprecation, eval set staleness, regulatory exposure. Named, sized, with a triage path each.
-
Stage-gate position. Where the project sits in the 90-day / 12-month / 24-month gate sequence and the named kill criteria for the next gate.
-
Budget actuals against milestones. Quarterly milestone variance: what was approved, what was spent, what was earned (eval-threshold pass), what is forecast.
-
Decisions requested. The named asks: continuation funding, scope expansion, retainer ramp, kill, restart. Each tied to evidence in sections 2 through 7.
-
Appendix: artifacts and links. Eval reports, unit-economics dashboard, regression triage log, retainer SLA. The board can drill in if it wants.
The order matters. A reader who stops after section 1 has the headline. A reader who stops after section 4 has the operational signal. A reader who stops after section 6 has the strategic position. The full memo is the audit trail.
Section-by-section guidance
Headline finding. Write this last, after the rest is drafted. It must be defensible from the body and survive a two-question challenge. Bad: “Continued strong progress.” Good: “Recommend Q3 funding at $410k against a Q3 milestone of >= 0.85 weighted eval and ≤ $0.034 cost-per-completion.”
Eval scorecard. Tables, not prose. Columns: eval set name, version, weighted score, locked threshold, delta since last memo, sample size. Rows are the live eval sets; typically a primary “production” eval, a regression test, and one or two domain-specific evals. If the project has only one eval set, the project is under-instrumented and that fact belongs in section 5.
Unit cost trajectory. A small line chart of cost-per-canonical-unit over the last four periods, paired with a table showing the same. The narrative belongs in two sentences: what trend is the chart showing, what is driving it (model swap, prompt compression, retrieval tuning, batching). Boards understand cost trajectories when they look like cost trajectories.
Last model-upgrade cycle. Three to five sentences plus a small table. What model changed. What was the regression rate. How long was the triage. What is the new locked threshold. What is expected in the next cycle. This section is the most under-written part of most AI memos and the most informative when written well.
Risks and the regression tail. Two or three live risks, each with: the risk in one sentence, the size in unit terms (regression points, dollar exposure, days at risk), the triage path, the owner. Avoid the temptation to list ten risks; the board’s attention is finite and the project’s two real risks are not on the list of ten.
Stage-gate position. State which gate the project is in (Q1/Q2/Q3/Q4 within the year, and 90-day / 12-month / 24-month within the long horizon). State the named criteria for the next gate. State the kill criterion at the next gate. We unpack the gate sequence in the payback paradox piece.
Budget actuals against milestones. A simple table: milestone name, milestone target, milestone status (passed / near-miss / failed), spend approved, spend actual, spend forecast for the next milestone. The discipline is to report variance against the eval-threshold milestone, not against feature lists. We argue the feature-list trap in the AI agency milestone trap piece.
Decisions requested. Numbered list. Each ask has a decision-asker, an amount or scope, and a “why now” tied to evidence. “Approve $410k Q3 continuation, conditional on Q2 milestone pass (which is documented in section 2 row 1).” Vague asks invite vague approvals.
Appendix. Links to the eval report, the unit-economics dashboard, the regression triage log, the retainer SLA. The board does not read these in the meeting; the board reads them when a question forces them to. Make them findable.
A worked example
Here is a compressed example for a fictional Q2 review of an AI agent project at an enterprise SaaS company.
1. Headline finding. Recommend Q3 continuation at $410k against a Q3 milestone of >= 0.85 weighted eval and ≤ $0.034 cost-per-completion. Q2 milestone passed at 0.83 / $0.038. Project is on the trajectory toward the 12-month capability gate.
2. Eval scorecard.
| Eval set | Version | Score | Threshold | Delta | N |
|---|---|---|---|---|---|
| Production agent eval | v1.4 | 0.83 | 0.80 | +0.05 | 240 |
| Regression hardening | v0.3 | 0.91 | 0.85 | +0.02 | 180 |
| Adversarial / prompt-injection | v0.2 | 0.78 | 0.75 | +0.04 | 120 |
3. Unit cost trajectory. Cost-per-completion has fallen from $0.061 (Q1 close) to $0.038 (Q2 close), driven by a Claude 4.7 -> 4.8 swap (35 percent of the gain), prompt compression on the planning step (40 percent), and batched retrieval (25 percent). Q3 forecast is $0.030 to $0.034 contingent on retrieval index restructure.
4. Last model-upgrade cycle. Claude 4.7 -> 4.8 swap completed week 7 of Q2. Regression rate on the production eval was 9 percent. Triage took 11 engineering days. New locked threshold is 0.83 weighted, up from 0.80. Next cycle (Claude 4.8 -> 4.9) is calendared for week 6 of Q3; reserve drawdown forecasted at $42k.
5. Risks and the regression tail. Two live risks. (a) Eval set v1.4 is leaning on synthetic prompts in the long-tail bucket; refresh of the long-tail with real customer prompts is in flight, owner Jenna, target close week 4 of Q3. Exposure: roughly 0.04 of weighted score if the synthetic bias proves out. (b) Vendor deprecation of the embedding model is rumored for Q4; we are pre-budgeting a 60-engineering-day swap and threshold re-lock against the new embeddings, owner Marcus, contingent activation.
6. Stage-gate position. Project is in Q2 of the calendar year and in the 90-day eval-feedback gate of the long horizon. Q2 milestone passed. Q3 milestone (production-traffic threshold + observability operating + retainer drafted) is on track. Kill criterion at end of Q3: regression rate above 15 percent or no production traffic at threshold.
7. Budget actuals against milestones.
| Milestone | Target | Status | Spend approved | Spend actual | Q3 forecast |
|---|---|---|---|---|---|
| Q1 trajectory | eval suite + 0.78 weighted | Passed | $380k | $372k | ; |
| Q2 regression | hold across upgrade | Passed | $390k | $396k | ; |
| Q3 production | traffic at threshold | In flight | $410k requested | ; | $410k |
8. Decisions requested.
- Approve $410k Q3 continuation. Conditional on Q2 milestone pass (section 2 row 1).
- Approve $42k drawdown from the model-upgrade reserve for the Claude 4.8 -> 4.9 cycle (section 4).
- Approve drafting of the 25 percent annualized maintenance retainer to be in place by Q3 close (section 6).
9. Appendix. Eval reports v1.4 / regression v0.3 / adversarial v0.2. Unit economics dashboard URL. Regression triage log Q2. Retainer SLA draft v0.2.
That is the entire memo, in roughly 700 words. A board director reads it in five minutes, understands what is being asked and why, and approves or pushes back on evidence; not vibes.
What to leave out
Three categories of content do not belong in an AI board memo and consistently do.
Feature lists and ship counts. A list of features shipped, agents launched, or models swapped is not progress; it is activity. Progress is the eval delta. The temptation to fill space with a feature list is strongest when the eval delta is unflattering; recognize the smell. We argue the case in the features-shipped piece.
Demo screenshots without eval anchors. A screenshot of a successful agent run is the AI equivalent of a happy-path test. It does not tell the board what the failure rate is on the workload that matters. If the screenshot must be in the memo, it belongs in the appendix with a pointer to the eval report.
Vendor pitches in disguise. “We are evaluating switching to vendor X” is appropriate; “vendor X is the future” is not. Boards are sophisticated enough to detect a memo that is selling a vendor decision rather than reporting on an investment. Keep the analysis on the eval data; let the data drive the vendor question.
A memo that respects what to leave out is half as long and twice as useful as one that does not.
Frequently asked questions
What is an AI project board memo?
A 1,800 to 2,500-word document that surfaces the eval thresholds, unit cost trajectory, and staged-payback gate evidence a board needs to approve the next 90 days of spend on an AI project. It replaces the feature-list update memo that is appropriate for traditional software but mis-prices AI work.
Why does the AI memo need a different template?
Because the questions a fiduciary has to answer about AI work are different. Is the system measurably working at threshold? Is the unit cost going the right way? Did the last model upgrade move the curve? Is the project earning the next gate? Jira and finance cannot answer those; an eval scorecard, a unit-cost chart, an upgrade-cycle section, and a stage-gate position section can.
What goes in the eval scorecard?
The named eval sets (production, regression, domain-specific, adversarial), their versions, current weighted scores, locked thresholds, deltas since last memo, and sample sizes. Three to seven rows, in a table, not prose. If the project has only one eval set the under-instrumentation belongs in the risks section.
How do you report unit cost?
Cost-per-canonical-unit (per-completion, per-action, per-resolved-ticket; whichever is the canonical workload unit) over four reporting periods, with the projection for the next two. The narrative names the drivers of the trend in two sentences: model swap, prompt compression, retrieval tuning, batching.
What is the model-upgrade cycle section?
A short report on what changed in the most recent frontier model release, the regression rate the project saw, the triage cost in engineering days, and the new locked threshold. It also forecasts the next cycle and the reserve drawdown expected. This is the most under-written and most informative section in a serious AI memo.
How is the stage-gate position section different from a status report?
A status report says “we are on track.” A stage-gate position section says “we are in the Q3 production gate, the named kill criteria are X and Y, the evidence we will present at gate close is Z.” It is a contract with the board about what the next 90 days will produce, and what would constitute failure.
What should the board memo not include?
Feature lists, demo screenshots without eval anchors, vendor pitches in disguise. These three categories pad the memo and obscure the signal. A clean memo is half as long and twice as useful.
How long should an AI board memo be?
1,800 to 2,500 words, readable in seven minutes. A reader who stops after the headline has the decision; a reader who stops after section 4 has the operational signal; a reader who stops after section 6 has the strategic position. The full memo is the audit trail.
How does this template relate to the 4-component value model?
The 4-component value model (capability earned, time-to-value, downside-risk-reduced, optionality created) feeds the eval scorecard and the unit cost trajectory in sections 2 and 3. The staged-payback gate framework (90-day / 12-month / 24-month) populates section 6. The board memo is the consolidated artifact that those frameworks feed; it is not a separate framework.
Key takeaways
- An AI project board memo answers four questions a software board memo cannot: is the system measurably working, is unit cost going the right way, did the last model upgrade move the curve, is the project earning the next gate.
- The 9-section template: headline, eval scorecard, unit cost trajectory, last model-upgrade cycle, risks and regression tail, stage-gate position, budget actuals against milestones, decisions requested, appendix.
- Eval scorecard is a table, not prose. Three to seven rows. Named sets, versions, scores, thresholds, deltas, sample sizes.
- Unit cost is reported as a four-period trajectory with two-period forecast and a two-sentence narrative on drivers.
- The model-upgrade cycle section is the under-written part of most AI memos and the most informative when done right.
- Stage-gate position names the next gate, the named criteria, the named kill criterion. It is a contract with the board, not a status update.
- Leave out feature lists, demo screenshots without eval anchors, and vendor pitches in disguise. A clean memo is half as long and twice as useful.
- Total length 1,800 to 2,500 words, readable in seven minutes by a non-technical director.
The right board memo template does not just save the board time. It changes which AI projects survive and which get killed; by making the evidence the board needs the easiest evidence in the room to find.
Arthur Wandzel