Home About Who We Are Team Services Startups Businesses Enterprise Case Studies Blog Guides Contact Connect with Us
Back to Guides
Enterprise Software 12 min read

The AI project egress problem: when cloud bills spike unexpectedly

The AI project egress problem: when cloud bills spike unexpectedly

Most AI cloud-bill surprises do not come from inference. They come from egress: traffic leaving the cloud; from your application VPC to the model API, from the model API to your trace pipeline, from your trace store to the eval platform, from one region to another for replay. Egress is invisible until the invoice arrives, then it dominates the conversation. On a typical 2026 AI project, ungoverned egress runs $8k to $30k a year above what a deliberately-architected stack would pay; spikes during a busy launch month can add another $10k to $20k in a single billing cycle. This piece names the four egress paths that drive most of the cost, the architectural moves that contain them, the diagnostic playbook that turns a surprise into a known cause within 90 minutes, and a defensible egress budget as a percentage of total AI project spend.

The argument sits inside the AI project economics manifesto: if inference is a pass-through line and observability is COGS, then egress is the friction tax that sits between the two. Egress that supports the eval loop is COGS. Egress that crosses cloud boundaries needlessly is waste; and it is the one of the largest source of “wait, what is that line on our cloud bill” moments on the engagements we audit.

Why egress is the surprise line item

Three structural features of AI workloads make egress harder to budget than in conventional cloud applications.

The model API often lives outside your cloud. A workload running in AWS that calls Anthropic, OpenAI, or another model provider hosted on a different cloud pays cross-cloud egress on most request and most response. The model API call from AWS to Anthropic on GCP is metered at $0.05 to $0.09 per GB on the AWS side and adds nothing visible inside the model invoice. On a workload that ships 5GB of prompt-and-response traffic per day, that is $90 to $160 per month per workload, and most teams do not see it until the cloud bill arrives.

Trace and replay flows run cross-region by default. The trace pipeline often emits to a vendor in a different region or different cloud than the application; the eval platform pulls traces from that vendor into yet another region. Each hop is metered. A trace volume of 200GB per month flowing AWS-to-vendor is $10k to $18k per year in pure egress fees that nobody allocated.

Egress pricing is asymmetric and confusing. Within a single cloud region, traffic is free. Across regions, it is $0.01 to $0.04 per GB. Across clouds (cloud-to-internet), it is $0.05 to $0.09 per GB. Across continents (cross-region intercontinental), it is even higher. Architectural choices that look free at design time accrue cost at scale, and the rates change frequently enough that a stack architected in 2023 pricing may be misallocated against 2026 pricing.

The right framing: egress is the friction tax on cross-boundary traffic. Most cloud boundary, region boundary, and provider boundary the data crosses is a metered hop. Most AI projects underestimate the number of hops in their data path by a factor of two or three.

The four egress paths that drive most of the cost

Across the engagements we have run and audited, four egress paths drive almost many of the cost.

Application-to-model-API egress. The cross-cloud hop from your application VPC to the model provider. Typically 60 to 80 percent of payload-in-transit by volume. The one of the largest egress line on most AI workloads. Mitigation usually involves either co-locating the workload with the model provider (when the provider runs on a single cloud) or using cross-cloud private connectivity (Direct Connect, Cloud Interconnect, or vendor-specific private endpoints).

Trace pipeline egress. Traffic from the application to the trace vendor. On a healthy observability setup, this is 100 to 300GB per month for a mid-sized workload; small relative to model traffic but expensive because trace pipelines often run cross-region or cross-cloud by default. We discuss the trace storage side of this in the AI project storage tax piece.

Eval and replay traffic. The eval cycle pulls historical traces and replay logs into the eval platform. On model-upgrade cycles (three to five times per year) this can spike to 500GB to 2TB in a single week. Spikes during an eval cycle are the most common source of unexpected cloud-bill jumps; teams forget that re-evals consume bandwidth as well as compute.

Cross-region replication and disaster recovery. Multi-region deployments replicate traces, replay logs, and vector indices across regions. Often the replication pattern was inherited from a non-AI workload and over-replicates by 2x to 5x. Mitigation: re-evaluate replication scope on a per-class basis; not most storage class needs full multi-region replication.

These four paths; application-to-model, trace pipeline, eval and replay, cross-region replication; typically account for 80 to 90 percent of total AI egress cost. The remaining 10 to 20 percent is small individual lines that rarely justify optimization effort.

Architectural moves that contain egress

Three architectural moves, applied deliberately, contain egress cost without compromising the eval loop or the operational surface.

Co-locate workloads with the model provider where possible. If your traffic concentrates on a single model provider running on a single cloud, running your application on that same cloud eliminates the cross-cloud hop on most request. A workload running on AWS that calls Anthropic on AWS Bedrock pays no cross-cloud egress on the model call. The same workload calling Anthropic’s direct API from AWS pays cross-cloud egress on most byte. The savings on a high-volume workload are typically $10k to $40k per year.

Run the trace pipeline in-region. Whether you self-host your observability stack or use a managed vendor, push the team to host the trace ingest endpoint in the same region as the application. Most managed observability vendors offer regional ingest endpoints; using the wrong one can double or triple the egress on the trace pipeline.

Cap eval and replay traffic on schedule, not on demand. A re-eval that pulls 2TB of replay logs in a single afternoon spikes the egress line in ways that smooth pulls do not. Schedule re-evals into off-peak windows where possible, and pre-stage replay data in the eval platform’s region rather than streaming it during the eval run. We connect the broader cycle of model-upgrade cost to overall economics in the AI project compounding return piece.

These three moves; co-location, in-region trace ingest, scheduled and pre-staged eval pulls; typically eliminate 50 to 75 percent of total egress cost on a workload that has not been deliberately architected for egress economics.

The 90-minute diagnostic playbook

When the cloud bill spikes unexpectedly, the diagnostic should land within 90 minutes. The order that works:

  1. Pull the cloud provider’s cost-and-usage report for the affected month, filtered to data-transfer line items.
  2. Group by source region, destination region, and destination service. The one of the largest line is usually the cause.
  3. Cross-check against the application’s known data paths. If the largest line is application-to-model-API, the cause is usually a launch-month volume spike. If it is trace egress, the cause is usually retention policy or pipeline routing. If it is cross-region replication, the cause is usually an over-replicated storage class.
  4. Compare against the previous 90 days’ average. A 2x or larger jump on a single line is the cause; smaller jumps distributed across many lines usually indicate organic growth.
  5. Bring the architectural fix that addresses the largest line first. Egress optimization is Pareto-distributed; the top one or two paths usually carry 70 to 80 percent of the total.

We see the same diagnostic flow recur in the AI project burn-rate dashboard piece; egress is one of the lines a healthy burn-rate dashboard surfaces explicitly rather than burying inside “infrastructure other.”

How to size the egress budget

A defensible egress budget for a 2026 AI engagement at $250k total project cost sits at one to three percent; roughly $2.5k to $7.5k per year; when paths are deliberately architected, and four to eight percent; $10k to $20k; when they are not. The shape that fits a typical workload:

  • Application-to-model-API egress: 50 to 70 percent of the egress budget.
  • Trace pipeline egress: 15 to 25 percent.
  • Eval and replay traffic: 10 to 20 percent (concentrated on model-upgrade weeks).
  • Cross-region replication: 5 to 15 percent.

If the egress line creeps above four percent of total project cost, the cause is almost usually one of the four failure modes below. We treat egress as part of the AI project total cost of ownership; specifically the friction-cost line that scales with cross-boundary traffic.

The four failure modes

Egress budgets fail in four characteristic ways.

Failure 1; Cross-cloud workload-to-model architecture without recognition. The team architects on AWS, calls a model provider running on GCP, and pays cross-cloud egress on most request without the architectural decision ever being made deliberately. Mitigation: at architecture-decision time, name which cloud the model provider runs on and what the projected egress cost will be at expected volume.

Failure 2; Trace pipeline routed through the wrong region. The trace vendor’s ingest endpoint defaults to a different region than the application; the team rarely overrides the default. Mitigation: at trace-vendor onboarding, explicitly select the regional ingest endpoint that matches the application region.

Failure 3; Re-eval spikes attributed to “infrastructure” instead of egress. The team sees a $4k spike on a single week and chalks it up to “infrastructure” without diagnosing the cause. Six months later the same spike has happened four times. Mitigation: tag re-eval traffic distinctly so it shows up as a known periodic line rather than a surprise.

Failure 4; Over-replication of storage classes. The team inherits a multi-region replication policy from a non-AI workload and replicates trace and replay storage that does not need full geographic redundancy. Mitigation: per-class replication policy with a default of single-region for non-critical operational storage and multi-region only for compliance-critical data.

We see many four failure modes recur in the 6 hidden taxes on most AI project; egress is the most surprise-prone of the hidden taxes because it spikes rather than creeps.

Frequently asked questions

How much should we budget for egress on a $250k AI project?

One to three percent of total project cost when paths are deliberately architected; roughly $2.5k to $7.5k per year; and four to eight percent when they are not. The 3x to 4x gap between architected and ad-hoc egress is the single biggest lever in the egress budget.

Why is egress so much more expensive than ingress?

Cloud providers price ingress at zero or near-zero to encourage data to land on their infrastructure, then price egress at $0.05 to $0.09 per GB to discourage it from leaving. The asymmetry is deliberate; it shapes architectural choices in ways most teams do not recognize until the bill arrives.

Can we avoid many cross-cloud egress?

Usually not entirely, but most workloads can eliminate 60 to 80 percent of it through co-location with the model provider, in-region trace ingest, and scheduled eval traffic. The remaining 20 to 40 percent is the residual cost of integrations the workload genuinely needs.

What about private connectivity (Direct Connect, Cloud Interconnect)?

These help on high-volume cross-cloud paths, but they have minimum monthly fees that only pay back at scale. The breakeven is typically 5TB per month of cross-cloud traffic; below that, optimizing the architecture pays back faster than provisioning private connectivity.

How do we monitor egress in real time?

Cloud-native cost-and-usage exports updated daily, plus an alerting rule that fires on any single data-transfer line that doubles week-over-week. The alerting rule catches spikes within 24 to 48 hours rather than waiting for the monthly invoice.

Does using model providers’ AWS Bedrock or Vertex AI help?

Yes, when your workload runs on the same cloud. AWS Bedrock for an AWS workload eliminates cross-cloud egress on the model call. The same workload calling the model provider’s direct API still pays cross-cloud egress. The savings are real and easy to quantify before deciding.

How does egress interact with model upgrades?

Re-evals are bandwidth-intensive; they pull replay logs and run them through new models. On model-upgrade weeks, egress can spike 2x to 4x. Pre-stage the replay data in the eval platform’s region or schedule the re-eval into off-peak windows. We discuss the broader model-upgrade cost in the AI project economics manifesto.

Where does egress show up on the P&L?

Inside the inference-and-infrastructure COGS line, alongside model API spend, license cost, and storage. Egress that is not classified as COGS; for example, lumped into “general cloud spend”; usually escapes operational scrutiny and grows unchecked.

Key takeaways

  • Egress is one to three percent of total AI project cost when architected, four to eight percent when not; the gap is the single biggest lever.
  • Four egress paths drive 80 to 90 percent of the cost: application-to-model-API, trace pipeline, eval and replay, cross-region replication.
  • Co-location with the model provider, in-region trace ingest, and scheduled eval pulls eliminate most of the avoidable cost.
  • A 90-minute diagnostic playbook turns a surprise spike into a known cause; the top one or two egress paths almost usually carry 70 to 80 percent of the total.
  • Tag re-eval traffic distinctly so it shows up as a known periodic line rather than a surprise on model-upgrade weeks.

Last Updated: May 9, 2026

AW

Arthur Wandzel

SFAI Labs helps companies build AI-powered products that work. We focus on practical solutions, not hype.

See how companies like yours are using AI

  • AI strategy aligned to business outcomes
  • From proof-of-concept to production in weeks
  • Trusted by enterprise teams across industries
Get in Touch →
No commitment · Free consultation

Related articles