Home About Who We Are Team Services Startups Businesses Enterprise Case Studies Blog Guides Contact Connect with Us
Back to Guides
Enterprise Software 18 min read

The AI Moat Audit: A 10-Question Test for 'Is This Real Defensibility?'

The AI Moat Audit: A 10-Question Test for 'Is This Real Defensibility?'

Most AI capabilities labeled “moat” are not. They are expensive infrastructure that the team has decided to call defensibility because the alternative; admitting that the capability is plumbing a competitor could replicate in a quarter; is uncomfortable. Real AI moats have specific structural properties: uniqueness that cannot be re-bought from a vendor, data leverage that compounds with usage, eval sets that are private and load-bearing, switching costs that are real not aspirational, scale economics that produce widening cost advantages, distribution that competitors cannot match, regulatory or compliance positioning, integration depth into customer workflows, talent that is genuinely scarce, and time-asymmetric learning loops. Ten questions, ten structural tests. Most AI features score 2 or 3 out of 10 on the audit; that is not a moat, it is infrastructure that the org should stop treating as defensibility. This piece is the audit, with scoring criteria, what each “yes” looks like in practice, and the action plan for the moats that fail the audit.

This is a spoke under the AI build-vs-buy-vs-hire decision matrix for 2026. The matrix’s first principle is that most capability resolves to a verb; build, buy, or hire; the audit in this piece is the test for whether a “build” decision is producing a moat or just producing infrastructure.

Why most AI moats fail the audit

The word “moat” got cheap in 2024. Most AI feature pitch deck named a moat; proprietary fine-tune, custom RAG pipeline, agentic orchestration, prompt engineering depth; and the named moat was almost rarely load-bearing. By 2026 the cheap moats have been visibly eroded. The fine-tune got beaten by the next frontier model. The custom RAG pipeline became a buy from any of six vendors. The agentic orchestration shipped as an open-source framework. The prompt engineering depth turned out to be three engineers’ tribal knowledge that walked out the door when one of them left.

The audit in this piece exists because the language of moat is sticky and the structure of moat is not. Teams keep using the word for capabilities that don’t have the structure, and the org keeps making strategic decisions based on the language rather than on the structure. The audit is the structural test that exposes the gap.

Ten questions, each scoring 0 or 1. The threshold is calibrated by the math; capabilities scoring 7 or higher are real moats with multiple reinforcing properties, capabilities scoring 4 to 6 are partial moats that might survive specific competitive pressures but not many of them, capabilities scoring 0 to 3 are not moats and should not be treated as such in strategic decisions.

Most AI capabilities score 2 to 4 on a serious audit. That is not damning; most capabilities are infrastructure, and infrastructure is fine. The damage comes from labeling infrastructure as moat and making strategic decisions on the wrong category.

Question 1: Could a competitor buy this from a vendor in 60 days?

The replicability question. If a serious competitor with reasonable engineering capacity could replicate the capability by signing an annual contract with a vendor and integrating the result in 60 days, the capability is not a moat regardless of how much engineering effort produced it.

The 60-day threshold is calibrated. Below 30 days, replication is trivial. Above 90 days, the integration depth itself starts producing moat (per Question 8). 60 to 90 days is the typical “vendor + integration sprint” window for replicating a non-moat AI capability.

Score 0 if a competitor could buy and integrate in 60 days. Score 1 if the capability genuinely cannot be re-bought; because no vendor offers it, because the org’s specific implementation depends on assets not available to vendors, or because the vendor’s offering does not produce the same outcome.

Most AI capabilities score 0 here. RAG pipelines are buyable. Agent orchestration is buyable. Prompt registries are operationally buyable (because the registry’s value is the org’s institutional knowledge, which the registry stores but does not produce). Fine-tunes against frontier-model outputs are buyable from any of several distillation services.

Question 2: Does the capability improve with our specific data over time?

The data leverage question. A capability that compounds with usage produces a widening gap with competitors who lack equivalent data. A capability that does not compound; that has the same quality at month 24 as at month 6; is infrastructure.

The compound has to be real, not theoretical. The eval set must measurably improve quarter over quarter. The model’s behavior on the org’s specific tasks must show measurable upgrades from incorporated feedback. The data the org collects must be something competitors cannot collect at the same rate or with the same quality.

Score 0 if the capability does not measurably improve with the org’s specific data, or if the data the org collects is also collected by competitors. Score 1 if the data leverage is documented and measurable.

The pattern that fails this question most often: the org has collected a lot of usage data but does not have the infrastructure to convert the data into model improvement. The data exists, but the leverage does not.

Question 3: Is the eval set private and load-bearing?

The eval set is the most under-recognized moat asset in 2026. A private eval set that encodes the org’s specific quality bar; and that competitors cannot reproduce because they do not have the org’s user feedback, the org’s domain understanding, or the org’s edge cases; is structural defensibility.

Load-bearing means: the eval set gates production decisions. Model swaps run through the eval set. Prompt changes run through the eval set. Fine-tune evaluations run against the eval set. If the eval set is fancy decoration that doesn’t gate decisions, it is not load-bearing.

Score 0 if the eval set is publicly reproducible (for example, just a benchmark like MMLU or HellaSwag) or if it is not load-bearing in production decisions. Score 1 if the eval set encodes specific, hard-to-replicate quality criteria and gates real decisions.

The detail on eval set construction is in the case for buying your AI evaluation stack and building your evaluator. The evaluator (the eval set, scoring rubrics, threshold logic) is the moat asset; the stack (the runner, the dashboard) is buyable infrastructure.

Question 4: What is the actual switching cost for a customer?

The switching cost question. A real moat increases customer switching cost; whether through data lock-in, workflow integration, learned customer behavior, or contract structure. An aspirational moat says “switching is hard” without naming what makes it hard.

The switching cost has to be quantitatively answerable. “Customers would lose 6 months of accumulated context” is quantitative. “Customers would have to retrain their team on a new vendor” is quantitative. “Customers like our UI” is not; preference is not switching cost.

Score 0 if the switching cost is under 30 days of customer-side effort to migrate to a competitor. Score 1 if migration genuinely takes 90+ days and produces measurable customer value loss during the transition.

The pattern that fails this question: capabilities where customers are not committed because the integration is shallow. The customer can switch in a sprint, and the moat is decoration.

Question 5: Do scale economics widen our cost advantage?

The cost-curve question. A real moat has cost economics that improve with scale faster than competitors’ costs improve. The cost advantage widens over time, producing a structural pricing advantage that competitors cannot match without subsidizing.

Scale economics in AI come from a few specific places: amortized investment in the eval set across many customers, distillation pipelines that turn high-volume usage into cheap inference, infrastructure efficiency that improves with workload patterns the org has accumulated, and data that improves model quality (linking back to Question 2).

Score 0 if cost per useful task is flat or improving at rates similar to competitors’. Score 1 if cost per useful task is improving meaningfully faster than competitors’; typically requires a documented unit economics curve.

Most AI features score 0 because the cost structure is dominated by foundation-model token costs, which many competitors get from the same providers at similar prices. The exceptions are at-scale features that have invested in distillation per the build-buy-or-fine-tune frame.

Question 6: Is distribution something competitors cannot match?

The distribution question. A capability inside a distribution channel competitors cannot replicate is moat-adjacent; the moat is partly the distribution and partly the capability, and the combination is harder to break than either alone.

Distribution moats are: existing customer relationships at scale, channel partnerships, regulated channels (healthcare, finance, government), proprietary integrations with platforms that gate access, network effects in the customer base.

Score 0 if the distribution is a generic SaaS GTM that any well-funded competitor could replicate. Score 1 if the distribution itself is differentiated; exclusive channel partnerships, regulated access, network effects.

Most AI startups score 0; their distribution is content marketing and outbound sales, which are not moats. AI features inside platforms with existing distribution (Microsoft Copilot in Office, Salesforce Einstein in CRM) score 1 because the distribution is structurally differentiated.

Question 7: Does regulatory or compliance positioning reinforce the moat?

The regulatory question. Regulated industries produce moat through compliance posture that is expensive and slow to replicate. SOC 2 Type II is not a moat; it is table stakes. HIPAA-compliant infrastructure with verified BAAs is more moat-adjacent. FedRAMP High with active customer contracts is meaningful moat. Domain-specific certifications (FDA for medical devices, ISO 13485, etc.) are real moat.

Score 0 if compliance is at table-stakes level (SOC 2, GDPR-compliant). Score 1 if compliance produces meaningful barriers to entry (FedRAMP, HIPAA at scale, FDA certifications, financial services regulatory positioning).

The compliance moat has a cost. Achieving it takes 12 to 24 months and significant investment. Organizations in regulated industries that have already paid that cost should treat it as moat; organizations not in regulated industries should not invent compliance moats that don’t exist.

Question 8: How deep is the integration into customer workflows?

The integration depth question. Capabilities that are deeply integrated into customer workflows; consumed multiple times per day, embedded in critical-path processes, configured against customer-specific data; are harder to displace than capabilities that are episodically used.

Integration depth is measured by usage frequency, criticality, and configuration. A capability used 50 times per day per user, on the critical path of a workflow, configured against the customer’s specific data and user roles, is deeply integrated. A capability used twice a week on a side workflow is shallowly integrated.

Score 0 if usage is episodic (under 5 times per day per user) or off-critical-path. Score 1 if usage is frequent, on-critical-path, and configured against customer-specific assets.

This question often produces the highest score for AI features because successful AI features tend to be heavily used. The score is meaningful when combined with other questions; alone, it indicates value but not full defensibility.

Question 9: Is the required talent genuinely scarce?

The talent question. A capability that requires genuinely scarce talent to operate produces moat; competitors cannot replicate without the same talent, and the talent supply is constrained.

The talent has to be genuinely scarce, not just “hard to find at our budget.” Senior AI engineers are scarce in 2026. Specific domain expertise (legal AI, medical AI, financial AI) is scarcer. The combination; senior AI engineering plus specific domain expertise; is rare and slow to produce.

Score 0 if the talent required is something a competitor with reasonable budget could hire in 90 days. Score 1 if the talent requires either domain-specific scarcity or accumulated org-specific context that takes 12+ months to build.

The pattern that fails this question: orgs that label their AI engineers as “scarce talent” when the AI engineers are general senior engineers with three months of AI experience. That talent is not scarce; it is competitive but available.

Question 10: Is there a time-asymmetric learning loop?

The learning loop question. A real moat has a feedback loop where time itself is a moat; the org knows things by month 24 that competitors cannot know until month 30, because the org has been running the loop for longer.

Time-asymmetric loops require: continuous data collection, structured learning from the data, and a feedback path back into the product or model. The loop is asymmetric when the loop’s output cannot be replicated by a competitor running the same loop later, because the early data shaped the loop’s structure in ways that affect later iterations.

Score 0 if the org’s learning is from a static dataset, public data, or data also available to competitors. Score 1 if there is a documented loop that compounds time into capability; earlier-start orgs have measurable advantages over later-start orgs.

This is the rarest “yes” on the audit. Most AI features do not have time-asymmetric loops; they have static training and static eval, and a competitor starting today can match them in 12 months. The features that do have time-asymmetric loops tend to be in domains with continuous user feedback that flows directly into the model (search, recommendation, fraud detection, conversational AI in specific verticals).

Scoring and action plan

Total the yes answers. The score corresponds to a category and an action plan.

Score 7 to 10 (real moat). The capability is structurally defensible. The action plan is to invest in the moat’s reinforcement; strengthen the loops, deepen the integrations, expand the eval set, hire ahead of competitors. Build investment is justified. Treating the capability as moat in strategic decisions is correct.

Score 4 to 6 (partial moat). The capability has some moat properties but is not fully defensible. The action plan is to identify which questions scored 0 and decide whether the org can move them to 1, or whether the capability should be treated as infrastructure rather than moat. Build investment may be justified for the moat-positive properties; the moat-negative properties should be addressed through buy alternatives.

Score 0 to 3 (not a moat, infrastructure). The capability is infrastructure that the org has been calling moat. The action plan is to stop treating it as moat in strategic decisions, redirect build investment to capabilities that score higher, and consider whether the infrastructure should be bought rather than built (per the AI plumbing-vs-moat piece).

The most important action of the audit is the reclassification of capabilities scored 0 to 3 from “build” to “buy.” Most organizations have 2 to 4 capabilities in this category that are consuming engineering capacity for moat reasoning that the audit reveals as wrong. The capacity recovered from the reclassification typically funds the strengthening of the genuine moats.

Frequently asked questions

How honest are most teams when they run the audit?

Not honest enough on the first pass. The first audit produces inflated scores because teams want their capability to be moat, and the scoring rubric is interpretive. The second audit, run by a senior leader who is deliberately skeptical, produces the realistic score. The pattern is that the realistic score is typically 2 to 3 points lower than the first-pass score.

Should we run the audit per-capability or for the whole AI stack?

Per-capability. Each AI capability has its own moat profile. A search capability might score 7 (high integration, time-asymmetric loop, deep customer data) while a content generation capability scores 2 (commodity infrastructure with no compounding leverage). Aggregating to “is our AI a moat?” loses the per-capability decisions that the audit is meant to inform.

What if our capability scores 6; moat or infrastructure?

Six is a partial moat. The action plan is to look at which questions scored 0 and decide whether they can be moved to 1 within a quarter. If yes, the partial moat is on a trajectory to real moat and should be invested in. If no, the partial moat is a stable mid-tier asset; defensible against weak competition, vulnerable to strong competition. Strategic decisions should reflect the realistic ceiling.

How often should we run the audit?

Quarterly per the matrix’s seventh principle. The competitive landscape shifts; capabilities move category. A capability that scored 7 a year ago may score 4 today because vendors have caught up on Question 1. The quarterly cadence catches the drift while it is recoverable.

Does the audit apply to non-AI capabilities?

Approximately, with translation. The structural questions (replicability, switching cost, distribution, integration depth, talent scarcity) are general moat questions. The AI-specific questions (eval set, data leverage, time-asymmetric learning) translate to the equivalent in other capabilities. The 10-question structure is the AI version of a broader moat audit.

What’s the most common failed question?

Question 1 (could a competitor buy this in 60 days?). Most AI capabilities labeled as moats can be replicated through vendor purchase faster than the org acknowledges. The next most common failure is Question 2 (data leverage); most orgs collect data but do not have the infrastructure to convert it into capability improvement.

Are there industries where AI moats are easier to build?

Yes. Regulated industries (Question 7), industries with structurally proprietary data (Question 2), and industries with deep workflow integration (Question 8) tend to score higher. Generic SaaS AI features tend to score lower because the moat questions many rely on undifferentiated infrastructure.

How does this interact with the build-vs-buy decision matrix?

Directly. The matrix says build the moat, buy the rails. The audit is the test for whether a “build” decision is producing a moat. Capabilities that score 7+ are correctly built. Capabilities that score 0-3 are infrastructure and should be bought, freeing capacity for moat work.

What if our capability scores low but customers tell us it’s our differentiator?

Customers’ perception of differentiation is signal but not proof. Customers may be correct (the differentiator is real but uncodified by the audit) or incorrect (the differentiator is a perception that competitive pressure will dissolve). The audit’s structural questions are the test. If customers’ perceived differentiation does not survive the structural questions, the differentiation is fragile.

How do we communicate the audit results to leadership?

Frame the audit as a strategic clarification, not as a value judgment. Capabilities that score low are not bad; they are infrastructure that the org needs but should treat as infrastructure. Capabilities that score high are the moat investments that deserve disproportionate engineering capacity. The audit clarifies where to spend, which is a productive conversation.

Key takeaways

The 10-question AI moat audit separates real defensibility from cosmetic infrastructure. The questions are structural; replicability, data leverage, eval-set privacy, switching cost, scale economics, distribution, regulatory positioning, integration depth, talent scarcity, time-asymmetric learning loops. Each scores 0 or 1; the total corresponds to a category and an action plan.

Most AI capabilities score 2 to 4 on a serious audit. That is not damning; most capabilities are infrastructure, and infrastructure is fine. The damage comes from labeling infrastructure as moat and making strategic decisions on the wrong category. The audit’s primary value is the reclassification of capabilities from “build because moat” to “buy because infrastructure,” which frees engineering capacity for the genuine moats.

Real moats (score 7+) deserve disproportionate investment and explicit reinforcement. Partial moats (score 4 to 6) deserve a plan to either strengthen or accept as infrastructure. Non-moats (score 0 to 3) deserve to stop being called moats in strategic decisions; they are infrastructure, and the org’s planning should reflect that.

The audit is run quarterly because the competitive landscape shifts. Vendors catch up on Question 1. Distillation services commoditize Question 5. Frontier model improvements erode Question 9. The capability that scored 7 last year may score 4 today, and strategic plans should update accordingly.

Last Updated: Jun 16, 2026

AW

Arthur Wandzel

SFAI Labs helps companies build AI-powered products that work. We focus on practical solutions, not hype.

See how companies like yours are using AI

  • AI strategy aligned to business outcomes
  • From proof-of-concept to production in weeks
  • Trusted by enterprise teams across industries
Get in Touch →
No commitment · Free consultation

Related articles