Home About Who We Are Team Services Startups Businesses Enterprise Case Studies Blog Guides Contact Connect with Us
Back to Guides
Enterprise Software 14 min read

The AI Project Red-Team Budget: A Defensible % Allocation

The AI Project Red-Team Budget: A Defensible % Allocation

Three to seven percent of total project spend on red-team is the defensible 2026 allocation for an AI project. The percentage is small relative to the eval budget but structurally non-negotiable, because adversarial behavior is measured against a different distribution than workload behavior; and folding the two together produces neither. A project that funds workload evaluation at 30 percent and adversarial evaluation at zero percent has measured half of the system. The half it did not measure is where reputational and regulatory failures originate.

This is a spoke under the AI project economics manifesto, which argues that evaluation cost has replaced feature cost as the unit of account. Red-team is evaluation against an adversarial distribution; the budget percentage is what makes that evaluation operational.

Why red-team is a separate line from the eval budget

Workload evaluation and adversarial evaluation measure against different input distributions.

Workload evaluation measures the system against the inputs it is supposed to handle correctly; the production distribution as observed from real users. The eval set is curated to be representative of that distribution. Threshold calibration is performed against samples drawn from it. Regression detection is calibrated to alert when behavior on this distribution shifts.

Adversarial evaluation measures the system against inputs an attacker constructs to exploit it. The distribution is by definition out-of-sample for the workload eval. It includes prompt-injection attempts, jailbreak constructs, retrieval-poisoning attacks, agentic-action abuse, and threat models that did not exist when the workload eval was locked. The adversarial distribution evolves continuously as attackers learn the system; the workload distribution evolves with the underlying business.

Folding the two into a single eval line typically produces eval sets that are over-stratified on production-realistic inputs and under-stratified on adversarial inputs. The signal coming back from the unified eval is a weighted average across distributions that should not be averaged. A regression on adversarial inputs gets masked by stable behavior on workload inputs. A regression on workload inputs gets masked by stable behavior on adversarial inputs. Neither line gets the rigor it requires.

Separating the budgets converts each into a measurable activity with defined inputs, defined outputs, and defined thresholds. The cost of the separation is two budget lines instead of one; the benefit is that each line measures what it claims to measure.

What 3 to 7 percent funds

The empirical 2026 distribution of red-team budget across the work it funds:

Red-team budget lineShare of red-team budgetNotes
Pre-launch adversarial probing25 to 35 percentLargest line on first release
Quarterly internal red-team cycles20 to 30 percentCadence-based ongoing work
External red-team engagement15 to 25 percentAnnual on customer-facing systems
Threat-model maintenance5 to 10 percentUpdates as landscape evolves
Remediation cycles15 to 25 percentFixes for issues found
Test infrastructure and reporting5 to 10 percentAuditable trail of findings

The pre-launch line is the largest on first releases because the system has rarely been adversarially probed. Subsequent releases run smaller pre-launch lines because the threat-model coverage carries forward. The external-engagement line is the second-most-misallocated; typically it is treated as optional and skipped, which produces internal red-team that converges on the threat models the build team has already considered.

The remediation line is structurally important. A red-team that finds issues it cannot fund the remediation of produces a known-bad system the team has documented but not improved. The mature 2026 budget treats remediation as a guaranteed component of red-team spend rather than as a contingency, because most red-team round produces remediation work.

The five threat models in scope for enterprise AI

Five threat models cover most enterprise AI workloads in 2026.

Prompt injection from user inputs. An attacker constructs an input designed to override system instructions, extract internal prompts, or redirect the model’s behavior. This is the canonical AI threat model and the one most teams already test for, though typically not at the depth the actual threat surface requires.

Jailbreak attempts that bypass system instructions. Variants of prompt injection focused on getting the model to produce content the system instructions prohibit; unsafe outputs, off-policy responses, role-play escapes. The threat model evolves continuously as attackers publish new techniques.

Data exfiltration through model outputs or tool calls. An attacker tries to extract training data, system prompts, internal context, or data the model has access to via tool calls. The exfiltration channels include direct extraction, encoding attacks, and side-channel inference from response timing or structure.

Agentic-action abuse. For systems with tool-calling capability, an attacker tries to induce unauthorized tool use, escalate privilege through chained tool calls, or produce irreversible state changes the user did not request. The blast radius is structurally larger than non-agentic threat models because the failure produces real-world side effects.

Indirect prompt injection through retrieved documents. An attacker plants adversarial content in a document the AI system later retrieves. The retrieval step launders the attack; the user did not type it, the model did not hallucinate it, but the model now executes it as if it were trusted instructions. This is the most-undertested threat model in 2026 and it scales linearly with retrieval-augmented generation deployment.

Higher-stakes deployments add domain-specific threats. Medical systems test for medical-advice abuse. Legal systems test for legal-substantiation manipulation. Financial systems test for unsafe trading or recommendation attacks. These add red-team budget proportional to the regulatory exposure of the domain.

Internal versus external red-team mix

The defensible mix for a customer-facing AI system:

Internal continuous red-team. Automated adversarial test suites running in CI on most release. Manual internal red-team review on most major version. The internal team understands the system, the threat models they have considered, and the prior remediation history. They produce coverage on known threat models at high cadence and low cost.

External annual red-team. A contracted external team with no prior context on the system runs adversarial probing for a defined window; typically two to four weeks. They find threat models the internal team had not considered, threat models published in research after the internal team last refreshed, and threat models specific to attacker patterns the internal team is not exposed to.

The external engagement is structurally necessary because internal teams converge. They develop intuition for the threat models they have already tested and stop developing intuition for the threat models they have not. External red-team breaks this convergence by importing threat-model diversity the internal team cannot generate. Skipping the external engagement produces internal red-team that grades its own homework against a stable rubric; and the rubric drifts away from the actual threat landscape over time.

For internal-only systems with low blast radius, the external engagement is sometimes defensible to skip. For customer-facing systems, regulated systems, and systems with agentic action capability, it is structurally non-negotiable. The cost band runs $25,000 to $80,000 per annual engagement depending on system complexity and threat-model scope.

The cadence that keeps red-team honest

Red-team without cadence is theater. The defensible cadence runs four overlapping rhythms.

Continuous automated red-team in CI. Most pull request runs against an adversarial test suite covering known threat models. Failures block merge. The suite is maintained against threat-model evolution. This is the cheapest line per unit of coverage and catches the regressions that come from feature changes drifting the system into adversarial vulnerability.

Manual internal red-team on most major release. Before each major version ships, the internal team runs a structured red-team round against the release candidate. Findings get triaged and either remediated pre-launch or documented as accepted residual risk. This is where most teams already operate, though typically with less structure than the cadence requires.

Quarterly structured red-team review. Once a quarter, the team runs a fresh red-team round against the current production system, refreshing threat models against the published research and observed attacker patterns. The review produces a written report documenting threat-model coverage, findings, and remediation backlog. This is the cadence most teams skip and the one that prevents drift away from the threat landscape.

Annual external red-team for customer-facing systems. Once a year, an external team runs adversarial probing against the production system. Findings feed the threat-model catalog and remediation backlog for the following year. This is the cadence that maintains threat-model diversity.

Higher-stakes deployments compress these cadences. Medical and legal systems often run quarterly external red-team rather than annual. Lower-stakes internal tools can extend the cadences; annual structured review with no external engagement is sometimes defensible.

The most-missed threat model

Indirect prompt injection through retrieved documents is the most-undertested threat model in 2026 enterprise AI red-teams.

Teams red-team direct user inputs aggressively. Prompt injection through the chat field gets continuous coverage. Jailbreak attempts get continuous coverage. The agentic-action abuse threat model gets coverage on most systems with tool-calling capability.

The threat model that consistently gets missed is the one where the attacker does not interact with the system at many. The attacker plants adversarial content in a document, web page, email, or knowledge-base entry that the AI system will later retrieve. When the system retrieves the document and processes its content as part of its context window, the adversarial instructions execute. The user did not type them. The model did not hallucinate them. They came in through the trusted retrieval channel.

This threat surface scales linearly with retrieval-augmented generation deployment. Most RAG system is a candidate. Most agent that browses the web is a candidate. Most document-summarization workflow that processes user-uploaded documents is a candidate. The exposure is structurally larger than direct prompt injection because the attack surface is the entire corpus the system can retrieve from, not just the user input field.

A defensible 2026 red-team budget allocates explicit threat-model coverage to indirect prompt injection. The remediation patterns; content sanitization at retrieval time, instruction-isolation prompting, output filtering, agentic-action confirmation gates; are well-known but unevenly applied. The dynamics of the residual risk this leaves on a project budget are detailed in the AI project insurance line.

Stratification by deployment posture

The red-team budget percentage shifts with deployment posture.

Deployment postureRed-team budget % of total spendDriver
Internal tool, no agentic action2 to 3 percentSmallest threat surface
Customer-facing, non-regulated4 to 5 percentDefault; brand and trust at stake
Customer-facing with agentic action5 to 6 percentTool-call threat surface
Regulated or high-stakes6 to 7 percentRegulator-noticed failure tail risk

Regulated workloads run at the upper end of the band because regulator-noticed failures carry tail risk that does not appear in standard incident-cost models. Agentic systems run higher than non-agentic systems at the same domain because the blast radius of an exploited agent is structurally larger; irreversible state changes, unauthorized tool use, escalating chains. Internal tools without agentic action can run lighter because the failure surface is bounded by the operator catching it.

The interaction with the AI project model routing economics is worth naming. Routing architectures that send sensitive inputs to higher-tier models add a routing-policy threat model; an attacker who can manipulate the routing decision can downgrade their input to a less-evaluated model tier. This requires explicit red-team coverage on the router itself, not just on the underlying models.

Frequently asked questions

Should red-team be separately contracted from the build team or part of the same engagement? Same engagement for internal red-team; separately contracted for external red-team. The internal team has the context to red-team continuously; the external team has the independence to find the threat models the internal team missed.

How does the red-team budget interact with security review and penetration testing on traditional infrastructure? They are complementary, not duplicative. Traditional pen-test covers infrastructure, authentication, and code-level vulnerabilities. AI red-team covers model behavior, prompt-injection threat surface, and agentic-action abuse. A defensible budget funds both as separate lines.

Should red-team findings be public or confidential? Confidential during the remediation window, then summarized publicly when remediation completes for customer-facing systems. The summary documents threat-model coverage and remediation patterns without exposing exploit details. This builds buyer trust without arming attackers.

What is the right way to communicate the red-team budget to a CFO? Frame it as the AI-project equivalent of the security audit line on a financial-services build. Both are non-negotiable percentages of project cost driven by the failure-cost coefficient of the deployment domain.

Does the red-team budget percentage change if the system uses an open-weight model versus a frontier model? Slightly higher on open-weight models because the threat surface includes weights extraction and adversarial fine-tuning attacks that frontier-vendor systems carry on the vendor’s side. The percentage shifts by 1 to 2 points.

Should the red-team budget cover compliance and certification work like SOC 2 or HIPAA? Partially. Compliance-driven adversarial testing is in scope for the red-team budget; the broader compliance certification work is a separate line. Mixing them tends to produce red-team that is optimized for compliance evidence rather than for threat-model coverage.

How does the red-team budget evolve over the project lifecycle? Roughly 60 percent in the build phase and 40 percent in operating phase, similar to the eval budget. Build-phase work covers threat-model definition, pre-launch probing, and remediation. Operating-phase work covers the continuous and quarterly cadences plus the annual external engagement.

What governance change makes red-team operational? Two changes. First, red-team findings have a named remediation owner with a defined SLA; typically 30 days for high-severity, 90 days for medium-severity. Second, the quarterly review is a standing item on the project review calendar rather than an exception. Together these convert red-team from periodic event into institutional practice.

Does red-team apply to AI features that wrap a managed vendor API rather than a self-hosted model? Yes. The threat surface is the system’s prompt structure, retrieval pipeline, and agentic action layer; many of which are the customer’s responsibility regardless of where the model itself runs. Skipping red-team on managed-API projects is a common 2026 failure mode.

How does red-team relate to the cost-side root causes of runaway AI projects? Closely. Inadequate red-team is a leading indirect cause of runaway projects because adversarial findings discovered post-launch require unbudgeted remediation work that displaces planned capability. The dynamics are documented in the anatomy of a runaway AI project.

Key takeaways

  • Three to seven percent of total project spend on red-team is the defensible 2026 allocation. Customer-facing non-regulated systems land at 4 to 5 percent; regulated and agentic systems land at 6 to 7 percent.
  • Red-team and eval are separate budget lines because they measure against different distributions. Folding them produces eval sets that are over-stratified on workload inputs and under-stratified on adversarial inputs.
  • Five threat models cover most enterprise AI: prompt injection, jailbreak, data exfiltration, agentic-action abuse, and indirect prompt injection through retrieved documents.
  • Indirect prompt injection through retrieval is the most-undertested threat model and scales linearly with RAG deployment. Most RAG system is a candidate; most are not adequately tested.
  • The defensible cadence runs continuous automated red-team in CI, manual internal red-team on releases, quarterly structured review, and annual external engagement for customer-facing systems.
  • Internal red-team is necessary for cycle frequency; external red-team is necessary for threat-model diversity. Skipping the external engagement produces internal red-team that converges on its own rubric.
  • The red-team budget reduces the insurance reserve required. Together they cover prevention and response across the full risk surface of an enterprise AI feature.

Last Updated: May 9, 2026

AW

Arthur Wandzel

SFAI Labs helps companies build AI-powered products that work. We focus on practical solutions, not hype.

See how companies like yours are using AI

  • AI strategy aligned to business outcomes
  • From proof-of-concept to production in weeks
  • Trusted by enterprise teams across industries
Get in Touch →
No commitment · Free consultation

Related articles