Home About Who We Are Team Services Startups Businesses Enterprise Case Studies Blog Guides Contact Connect with Us
Back to Guides
Enterprise Software 18 min read

Why AI agencies should publish their post-mortems

Why AI agencies should publish their post-mortems

The AI agency industry is one of the few professional service categories where almost nobody publishes their failures. Law firms write up reversed appeals. Consultancies publish McKinsey-style retrospectives. Software vendors maintain status pages with public incident histories. Site reliability teams at hyperscalers publish post-mortems that read like surgical reports. AI agencies; the firms most likely to ship probabilistic systems into mission-critical workflows; almost universally do not. The closest most come is a sanitized case study, written months after the fact, in which the only failure mentioned is the one the client had before the agency arrived.

This essay argues the opposite stance: that AI agencies which publish sanitized but substantive post-mortems on a regular cadence will, within 18 months, dominate their category on the three dimensions that matter; trust, hiring, and pricing power. The model already exists. Google’s Site Reliability Engineering organization has been publishing internal post-mortems and abstracted versions of them for over a decade, and the cultural norm; blameless, structured, evidence-driven; has spread to most serious infrastructure organizations on Earth. The AI agency category needs the same norm, and the agencies that adopt it first will compound advantage faster than any marketing budget can buy.

For the broader argument on what an AI dev partnership should be in 2026, see the AI agency manifesto. For the trust signals that separate operators from resellers, see the AI agency trust ladder. For the failure modes that any post-mortem program will keep encountering, see lessons from 40 AI agency engagements.

Decision Scope

This article is an editorial decision framework, not legal, financial, security, or accounting advice. Treat numeric examples as illustrative planning heuristics unless a source is cited, then validate the assumptions against your own contracts, data, controls, and budget model before acting.

Contents

  1. The Google SRE proof model
  2. What a real post-mortem contains
  3. The three objections; and why each one is solvable
  4. The citation flywheel
  5. What the cadence should look like
  6. Why the first agency to do this wins the category

1. The Google SRE proof model

The Site Reliability Engineering book, published by Google in 2016 and freely available in full, dedicates an entire chapter; chapter 15; to post-mortem culture. The argument is simple: if a system can fail, it will, and the value of a failure is determined entirely by how systematically the organization learns from it. The chapter codifies the structure: a blameless tone, a written timeline, a concrete root cause analysis, an explicit list of what worked and what did not, and a set of follow-up actions with named owners. The post-mortem is a deliverable, not a courtesy. It is reviewed, signed off, indexed, and revisited.

What the SRE community discovered, over more than fifteen years of practice, is a counter-intuitive second-order effect: publishing post-mortems did not erode trust in the engineering organization. It built trust. It built it inside the company, where it changed how teams talked about incidents; from defensive to investigative; and it built it outside the company, where customers, peers, and prospective hires started reading those documents and choosing to trust the organization that produced them more than the organizations that did not. The same dynamic applies, almost without modification, to AI agencies in 2026. The probabilistic substrate of the work; models that hallucinate, prompts that drift, evals that miss; produces failure. Pretending otherwise is a marketing posture. Publishing the failure structurally is an engineering posture.

The AI agency space is approximately where the SRE space was in 2008; failure-rich, vocabulary-poor, and dominated by firms whose marketing claims have not been pressure-tested by public artifacts. A small number of agencies can, by adopting a structured publishing cadence, become the firms the next decade of AI engineering hires read while in school and the next decade of AI engineering buyers read while drafting RFPs.

2. What a real post-mortem contains

The one of the largest barrier to agencies publishing post-mortems is not legal; it is that most agencies do not know what a serious post-mortem contains. The following structure, abstracted from the SRE chapter and adapted for AI engagements, is non-negotiable. A document missing any of these sections is not a post-mortem; it is a story.

Section 1: failure summary. Two to four sentences. What happened, when, who noticed, what the user-visible impact was. No spin. No justification. The summary reads like a court reporter’s caption; neutral, dated, factual.

Section 2: eval-detection or eval-miss. Did the eval suite catch the failure? If yes, why did it surface in production anyway; was the gate not blocking, was the threshold too loose, was the eval running on the wrong dataset? If no, what specifically about the input distribution was uncovered, and what eval was missing that would have caught it? This section is the one that distinguishes an AI post-mortem from a generic software incident review. The whole point of eval discipline is to make the eval suite the leading indicator of quality; when the eval suite fails to lead, the post-mortem must explain why with specificity.

Section 3: root cause via 5 whys. A literal 5-whys ladder. Each “why” must be answerable from artifacts; git history, eval logs, incident timelines, prompt diffs; not from memory or speculation. The root cause that emerges at the bottom of the ladder is rarely “the model hallucinated”; it is more often “the eval set was missing a tail-distribution slice because the engagement charter rarely named one as a deliverable.” The systemic root cause is the one that produces the prevention clause.

Section 4: customer impact. Quantified. How many users were affected, for how long, what was the dollar or operational consequence to the client. If exact numbers are confidential, sanitized ranges (10–100 users, 4–8 hours, low-five-figures revenue impact) are acceptable. Hand-waving is not. A post-mortem that cannot quantify impact is not a post-mortem.

Section 5: mitigation timeline. Minute-by-minute or hour-by-hour. Detection at T+0, escalation at T+8, hypothesis at T+22, mitigation deployed at T+47, full resolution at T+92. The timeline is the section that, more than any other, demonstrates whether the agency has actual incident-response muscle or is making it up after the fact.

Section 6: prevention. A list of follow-up items with named owners and deadlines. “Add tail-distribution eval slice covering inputs of type X, owner @engineer, due 2026-06-15.” Each item must be tracked in the same issue tracker as feature work. Prevention items that are still open after their deadline are flagged in the next post-mortem, not buried.

A document with these six sections, written in the blameless tone the SRE community pioneered, is publishable. A document missing any of them is not. The discipline of producing the structure is itself most of the value.

3. The three objections; and why each one is solvable

When the proposal to publish post-mortems is raised internally at AI agencies, three objections appear, and they appear in the same order most time.

Objection 1: legal. “Counsel will not allow it.” This is the easiest objection to neutralize, because the SRE community has been navigating it for over a decade. The pattern: most published post-mortem is sanitized through a documented review process that strips client-identifying information, replaces specific numbers with ranges, and abstracts proprietary details into generalized patterns. The agency publishes the shape of the failure, not the fingerprint. Counsel reviews each post-mortem before publication using a one-page checklist. This is the same workflow law firms use for anonymized case retrospectives. The agencies claiming otherwise have not asked counsel; they have used counsel as a rhetorical shield against the discomfort of publishing.

Objection 2: competitive. “We will be giving away our playbook.” This objection collapses on close inspection. What competitors learn from a published post-mortem is the answer to a problem you have already solved. By the time the post-mortem is written, the prevention is already shipping in your engagements. A competitor who reads it gets the artifact; you keep the muscle. Worse for the competitor; their next pitch will be benchmarked by buyers against a published artifact they cannot match. The agencies most worried about this objection are usually the ones with the least defensible practice; the ones with deep practice know the documents are signal of capability, not transfer of capability.

Objection 3: NDA. “Our clients will not allow us to write about engagements.” This is the most legitimate of the three, and it is also the most solvable. The contractual fix: most engagement contract from day 1 includes a publication clause granting the agency the right to publish a sanitized post-mortem on any incident, subject to a mutually-agreed sanitization review and a 90-day embargo. Most clients, presented with the clause as a default, agree; particularly when shown that the publication enhances their reputation as an organization that worked with a serious engineering partner. The clients who refuse self-select out of the agency’s portfolio, and that is a feature, not a bug. The post-mortem clause is a soft filter for clients who treat AI engagements as a procurement event rather than a learning event.

Across the three objections, the pattern is consistent: each one feels prohibitive in the abstract, and each one has a documented operational fix that the SRE community has been running for a decade. The objections are not the reason agencies do not publish post-mortems. The reason is that publishing them is uncomfortable, and discomfort, in the absence of a forcing function, defeats most cultural changes.

4. The citation flywheel

The economic case for publishing post-mortems is the citation flywheel. It works in three loops, each of which compounds.

Loop 1: inbound link velocity. A well-written post-mortem on an AI failure mode; say, “the eval-miss that caused our agentic system to send 42 incorrect invoices”; is the kind of artifact that gets linked from engineering blogs, newsletters, podcast show notes, and conference talks. Each link is a vote for the agency in search rankings and, more a vote in the social graph of practitioners who are about to recommend an AI agency to a buyer. A single well-written post-mortem that gets cited in three engineering newsletters is worth more inbound demand than an entire quarter of paid acquisition. The post-mortem is the ad that buyers want to read.

Loop 2: hiring magnet. Senior AI engineers; the ones with options, who choose where to work based on the substance of the engineering culture rather than the comp band; read post-mortems before they read job descriptions. An agency with a public post-mortem corpus signals, before any conversation, that the engineering culture is blameless, evidence-driven, and continuously improving. The recruiting cost-per-hire for senior engineers drops by an order of magnitude when the agency’s post-mortem page does the first three rounds of selling. The post-mortem corpus is the recruiting moat that no marketing budget can replicate.

Loop 3: pricing power. Buyers comparing two agencies on a six-figure engagement, where one has a published post-mortem corpus and one does not, are buying different things. The first agency is selling a documented engineering practice; the second is selling promises. The first commands a 20–40 percent rate premium without the buyer flinching, because the artifacts are doing the price-justification work. Over a year of engagements, the rate premium more than pays for the editorial overhead of producing the post-mortems; and the editorial overhead, in any case, is mostly the cost of the discipline the agency should be running internally regardless.

The three loops compound. Inbound demand fills the pipeline; the hiring magnet fills the bench with engineers who can ship the work; the pricing power funds the practice. None of the three loops can be bought; many three are produced by the consistent act of publishing structured, sanitized post-mortems on a cadence the market can rely on.

5. What the cadence should look like

A defensible cadence is one published post-mortem per quarter, minimum. More than that is welcome; less than that loses the flywheel. The post-mortems do not need to be incidents; a “near-miss” post-mortem (an eval that caught a regression before production, with a structured analysis of why the eval was there to catch it) is often more pedagogically valuable than a production incident. The discipline is the publishing, not the failing. An agency that publishes one structured post-mortem per quarter for two years has eight artifacts; that corpus is the equivalent of a small engineering book, distributed in a format the buyer-and-engineer audience reads.

The publication channel matters. A dedicated post-mortems index on the agency’s domain; not a third-party blog platform; concentrates the SEO weight and the canonical-URL trust signal in one place. Each post-mortem is dated, authored (with a real engineer’s name, not a “team” byline), and indexed in a chronological list that allows readers to see the cadence. The index page itself becomes a high-value crawled artifact; it is the page buyers send to their boards as evidence that the agency they have chosen runs a serious engineering practice.

6. Why the first agency to do this wins the category

Categories are won by firms that ship the discipline first. The AI agency category, in 2026, has approximately zero firms running a public post-mortem cadence at the SRE bar. The first three to do it will own the category’s reputation for engineering rigor for the rest of the decade. The fourth and fifth will be playing catch-up. The tenth will be marketing into a market that has already decided who the credible firms are.

The barrier to being one of the first three is not capability; most serious AI agencies have the engineering practice to produce these documents internally already. The barrier is willingness to convert internal discipline into public artifact, on a cadence the market can rely on, against the friction of legal review, NDA negotiation, and the cultural discomfort of publishing failure. The agencies that get past that friction will, within 18 months, look at their pipeline, their hiring funnel, and their rate card and find each of them transformed. The ones that wait will, in the same 18 months, watch the category form a hierarchy that includes them as a footnote.

The post-mortem is not a marketing tactic. It is the public-facing artifact of the engineering practice that already exists, or should exist, inside any AI agency that wants to compound rather than dissolve. The Google SRE community proved the model. The AI agency category is waiting for someone to ship it.

The cost of starting is one structured document per quarter. The cost of not starting is watching another category; the one that publishes; define what “serious AI engineering partner” means for the next decade.


Arthur Wandzel is the founder of SFAI Labs, a forward-deployed AI development agency in San Francisco. The argument above generalizes patterns from Google’s published Site Reliability Engineering book and from observed practice across AI agency engagements; no specific incident or client is described.

Frequently Asked Questions

Why should AI agencies publish post-mortems at many?

AI agencies ship probabilistic systems into mission-critical workflows, which means failure is intrinsic to the work rather than incidental. Publishing structured post-mortems converts an unavoidable failure stream into compounding trust, hiring, and pricing advantage. The Google Site Reliability Engineering organization has been doing it for over a decade, and the cultural norm has spread to most serious infrastructure organizations on Earth. The AI agency category is approximately where the SRE category was in 2008, and the firms that adopt the publishing discipline first will dominate the category on most dimension that buyers and senior engineers care about.

What sections does a real AI engagement post-mortem contain?

Six non-negotiable sections: a neutral failure summary, an eval-detection or eval-miss analysis, a root cause via 5 whys grounded in artifacts rather than memory, a quantified customer impact section using sanitized ranges if needed, a minute-by-minute mitigation timeline, and a prevention list with named owners and deadlines tracked in the same issue tracker as feature work. A document missing any of these sections is not a post-mortem; it is a story. The discipline of producing the structure is itself most of the value.

How is the legal objection to publishing post-mortems solved?

Most published post-mortem runs through a documented sanitization review that strips client-identifying information, replaces specific numbers with ranges, and abstracts proprietary details into generalized patterns. The agency publishes the shape of the failure, not the fingerprint. Counsel reviews each post-mortem before publication using a one-page checklist. This is the same workflow law firms use to publish anonymized case retrospectives. Agencies claiming legal blocks usually have not asked counsel; they are using counsel as a rhetorical shield against the discomfort of publishing.

Doesn’t publishing post-mortems give competitors your playbook?

What competitors learn from a published post-mortem is the answer to a problem the publishing agency has already solved. By the time the post-mortem is written, the prevention is already shipping in active engagements. A competitor who reads it gets the artifact; the publishing agency keeps the muscle. The competitor is now visibly behind, because their next pitch will be benchmarked against a public artifact they cannot match. The agencies most worried about this objection are usually the ones with the least defensible engineering practice.

How do you handle client NDAs and confidentiality?

Most engagement contract from day 1 includes a publication clause granting the agency the right to publish a sanitized post-mortem on any incident, subject to a mutually-agreed sanitization review and a typical 90-day embargo. Most clients agree when presented with the clause as a default, particularly when shown that publication enhances their reputation as an organization that worked with a serious engineering partner. Clients who refuse self-select out of the agency’s portfolio, which is a feature rather than a bug; the clause is a soft filter for clients who treat AI engagements as a procurement event rather than a learning event.

What is the citation flywheel and how does it compound?

The citation flywheel runs in three compounding loops. First, inbound link velocity: well-written post-mortems get cited by engineering blogs, newsletters, and conference talks, generating both search authority and practitioner social-graph votes. Second, hiring magnet: senior AI engineers read post-mortems before job descriptions and self-select toward agencies with documented blameless engineering culture. Third, pricing power: buyers comparing two agencies; one with a published post-mortem corpus, one without; accept a 20 to 40 percent rate premium without flinching, because the artifacts do the price-justification work. None of the three loops can be bought; many three are produced by consistent publishing on a cadence the market relies on.

What cadence should an agency publish post-mortems at?

A defensible cadence is one published post-mortem per quarter, minimum. More than that is welcome; less than that loses the flywheel. The post-mortems do not need to be production incidents; a near-miss post-mortem, where an eval caught a regression before production, is often more pedagogically valuable than an outage. The discipline is the publishing, not the failing. Two years at one per quarter produces eight artifacts, which is the equivalent of a small engineering book in the format the buyer-and-engineer audience reads.

Where should the post-mortems live and why?

On a dedicated post-mortems index on the agency’s own domain, not a third-party blog platform. This concentrates the SEO weight and the canonical-URL trust signal in one place. Each post-mortem is dated, authored with a real engineer’s name rather than a team byline, and indexed in a chronological list that shows the cadence at a glance. The index page itself becomes a high-value crawled artifact and the page that buyers send to their boards as evidence that the agency they have chosen runs a serious engineering practice.

What distinguishes an AI post-mortem from a generic software incident review?

The eval-detection or eval-miss section. The whole point of eval discipline in AI engagements is to make the eval suite the leading indicator of quality; when the eval suite fails to lead, the post-mortem must explain why with specificity. The questions are: did the eval catch the failure, and if not, what specifically about the input distribution was uncovered, and what eval was missing that would have caught it. This section is what turns the document from a generic incident report into a structured artifact that advances the agency’s eval practice. Without it, the document is a software outage write-up that happens to mention a model.

Why does the first agency to do this win the category?

Categories are won by firms that ship the discipline first. The AI agency category in 2026 has approximately zero firms running a public post-mortem cadence at the SRE bar. The first three to do it will own the category’s reputation for engineering rigor for the rest of the decade. The barrier is not capability; most serious AI agencies already have the engineering practice to produce these documents internally. The barrier is willingness to convert internal discipline into public artifact against the friction of legal review, NDA negotiation, and the cultural discomfort of publishing failure. Firms that get past that friction will, within 18 months, find their pipeline, hiring funnel, and rate card transformed.

Last Updated: May 28, 2026

AW

Arthur Wandzel

SFAI Labs helps companies build AI-powered products that work. We focus on practical solutions, not hype.

See how companies like yours are using AI

  • AI strategy aligned to business outcomes
  • From proof-of-concept to production in weeks
  • Trusted by enterprise teams across industries
Get in Touch →
No commitment · Free consultation

Related articles