Internal AI chargeback in 2026 is a quiet disaster. Most enterprises inherited the seat-based chargeback model from their SaaS era; finance bills consuming departments per active user, slotted into the same budget line that used to fund a Salesforce license. The model worked for deterministic SaaS because per-user cost variance was small and quality was a fixed property of the software. AI breaks both assumptions: a heavy user can run 100x the inference of a light user on the same seat, and quality is a stochastic function of the eval bar the platform team maintains. The result is cross-subsidy that masquerades as accounting, no usage signal back to consuming teams, and a platform team eating costs it cannot defend. This piece decomposes why per-seat chargeback breaks for AI, specifies the four components of a model that fits, and walks through three implementations.
It is a spoke under the AI project economics manifesto, which argues AI economics has shifted from feature cost to evaluation cost. Internal chargeback is the operational expression of that shift.
Why per-seat chargeback breaks for AI
Per-seat chargeback works when three properties hold. First, per-user cost variance is small. Second, quality is a fixed property of the software, not a function of ongoing maintenance. Third, the consumed resource is bounded by the seat.
AI breaks many three. Usage variance between heavy and light users on the same seat plan routinely runs 20x to 100x; a heavy user running multi-step agent workflows burns the platform team’s inference budget while the light user barely registers. Quality is not fixed: the platform team maintains the eval suite that protects most consuming team from regressions on each model release, and that cost grows with the number of consuming teams. And usage is not seat-bounded; an agent on behalf of a single user can consume orders of magnitude more inference than the user could directly.
The visible failure mode: consuming teams treat AI as free. The seat fee is invisible and decoupled from usage. With no price signal, no team has reason to optimize prompt length, model selection, retrieval depth, or call frequency.
The invisible failure mode is worse. The platform team’s budget line gets cut because finance sees the per-seat chargeback as the AI cost; and the per-seat number does not include eval overhead, prompt-registry maintenance, or bursting capacity. The platform team is funded for a fraction of what it spends. Eval discipline degrades. Model releases ship without regression coverage. The price is paid in quality variance on production AI systems.
Chargeback that does not reflect actual consumption is cross-subsidy with extra steps. A working model has four components, each addressing a specific failure mode of the seat model.
Component 1: Per-action chargeback
The unit of AI work is the action; a prompt run, an agent step, a retrieval call, an eval pass. Per-action chargeback bills consuming teams for the actions they run, priced as input tokens plus output tokens times volume, plus a small platform overhead per action.
The price per action is published quarterly, refreshed as frontier model prices move. It is computed as the average input plus output token cost for the action class, multiplied by a platform overhead factor (typically 10% to 20%) that funds the inference gateway, retrieval infrastructure, and observability stack.
The point is the price signal. A team that can see “this feature runs 50,000 actions per day at $0.04, costing $60,000 per month” can decide whether the feature is worth the cost. A team that cannot see the cost ships the feature, the cost lands on the platform team’s budget, and nobody owns the optimization.
Per-action chargeback is the internal mirror of the AI cost-per-action framework; the unit economics model that makes pricing legible regardless of model price moves.
Granularity matters. Per-token billing is illegible; a product manager does not think in tokens. Per-action billing, with a small number of action classes (chat call, retrieval call, agent step, eval run), translates token economics into the unit the consuming team thinks in.
Component 2: Eval-shared overhead
Eval costs are real, recurring, and grow with the number of consuming teams; but they do not split cleanly along action lines because the eval suite is partly platform overhead and partly consuming-team-specific. The working split is 50/50 between the platform team and the consuming team.
The platform team owns the eval harness, the baseline regression suite that runs on most model release, and the shared scaffolding (CI integration, scoring infrastructure, eval observability). Those costs are platform overhead, funded out of the per-action overhead margin.
The consuming team owns the domain-specific eval set, the rubric, and the threshold. Those costs scale with the team that owns the use case and are billed back as eval-action chargeback at the same per-action price as production calls.
The 50/50 split is the structural fix to a problem that destroys eval discipline if left alone. If the platform team funds 100% of evals, consuming teams under-invest in domain-specific evaluation because it is free to them. If consuming teams fund 100%, they cut eval coverage when budgets tighten and the platform team loses visibility into regression rates.
The split is also a forcing function on the eval bar. A consuming team that sees eval-action costs on its monthly bill has a price signal on eval frequency and rubric design. Detailed in stop budgeting AI projects in story points, budget them in eval runs; the eval-run is the unit, and the chargeback model is what makes the unit visible.
Component 3: Prompt-registry amortization
The prompt registry is shared infrastructure with strong network effects. A registered prompt has versioning, eval coverage, ownership metadata, and rollback capability that an unregistered prompt does not; and most team registering a prompt makes the registry more valuable to most other team.
Per-prompt chargeback breaks this. If teams pay per prompt registered or per pull from the registry, they will avoid registering prompts to dodge the chargeback, the registry fragments, and the network effect inverts.
The working model is amortization. Registry maintenance; engineering time on tooling, storage, eval coverage on registered prompts, ownership rotation infrastructure; is amortized across many consuming teams as a percentage layered onto the per-action overhead. Typical numbers: 2% to 5% of per-action overhead funds the registry.
The amortization is invisible to consuming teams (rolled into the per-action price), which preserves the incentive to register. The platform team funds the registry from the amortization line, which scales with consumption; the more teams use AI, the larger the registry budget grows, which matches the curve of registry maintenance work.
Component 4: Bursting buffer fund
AI workloads are bursty. A consuming team running a backfill, evaluation campaign, model migration, or research exploration can spike its monthly bill 10x. Without a buffer, the spike either breaks the consuming team’s budget or lands on the platform team; recreating the failure mode the chargeback was meant to fix.
The bursting buffer fund is a reserve held by the platform team, funded by 3% to 5% layered onto most per-action charge, that absorbs short-term bursts. A team that exceeds its monthly forecast by 3x or more on legitimate exploration can draw from the buffer for that month, with reconciliation at quarter-end.
The buffer is not a free pass. Draws disclose the cause and are capped per team per quarter. The buffer is sized to absorb 5% to 10% of total monthly inference spend; large enough for real spikes, small enough that the per-action surcharge is not material.
The buffer prevents the chargeback from punishing legitimate exploration. A model that taxes bursts at full rate creates an incentive to under-explore, and under-exploration is the most expensive long-run failure mode for AI programs.
Three implementations
The four components can be operated by three different functions.
FinOps-style. Owned by finance, instrumented through cloud cost tooling, charged back through the standard internal billing pipeline. AI inference is tagged at the project level, allocated via tagging discipline, billed monthly. Works for enterprises with mature FinOps practice and strong tagging discipline, where inference runs dominantly on cloud-provider infrastructure. Fails with weak tagging or hybrid cloud-and-self-hosted setups. The lowest-touch but the most blunt; captures usage signal but rarely captures eval overhead or registry amortization cleanly.
IT-enabled. Owned by IT, layered onto the existing seat-based chargeback with usage adders. The seat fee remains as a baseline; per-action consumption is billed as an adder; eval and registry overhead are funded from the platform team’s central IT budget. Works when IT already runs SaaS internal billing and has the muscle to add new line items, and early in adoption where per-action consumption layers cleanly onto the seat baseline. Fails when IT is procurement-only or when AI consumption dwarfs the baseline. The most common transitional model.
AI-platform-team-led. Owned by the AI platform team, instrumented through the inference gateway, charged back through a dedicated AI billing system. The platform team publishes the per-action price list, eval-share split, registry amortization rate, and buffer surcharge; operating chargeback as a real internal billing relationship. Works for enterprises with a mature AI platform team operating as an internal product organization, significant consumption, and credibility to defend prices. Fails when the platform team is a stealth project with no billing experience. The most aligned with AI economics; captures many four components cleanly; but the most operationally expensive. Mature 2026 AI programs gravitate here.
Picking the right implementation
The right implementation is a function of three factors: AI consumption scale, FinOps maturity, and platform-team operating maturity.
High consumption, high FinOps maturity, high platform-team maturity: AI-platform-team-led. The platform team has the muscle, the consumption justifies the billing system, and the FinOps practice provides the financial integration.
Medium consumption, high FinOps maturity: FinOps-style with eval-share carve-out. Finance runs the per-action chargeback through cloud cost tooling; eval and registry overhead is funded as a separate central budget line.
Low to medium consumption, low FinOps maturity, mature IT: IT-enabled. The seat baseline preserves political continuity, the per-action adder introduces usage signal, and IT runs the billing pipeline that already exists.
Across many three, the principle is the same: encode usage signal (per-action) and quality signal (eval-share), fund shared infrastructure (registry amortization), absorb burstiness (buffer fund). The implementation choice is operational; the four components are non-negotiable.
The chargeback model is the internal version of the vendor pricing-model question. The same alignment principles; strong usage signal, strong quality signal, low gameability; apply. Detailed in AI project pricing models, ranked by alignment with outcomes for the vendor-side equivalent.
Frequently asked questions
Why does per-seat chargeback break for AI?
Per-seat chargeback assumes per-user cost variance is small. AI workloads have 20x to 100x variance between light and heavy users on the same seat plan. Light users subsidize heavy users, heavy users hide their consumption, and the platform team carries the gap. No signal reaches either side.
What is per-action chargeback?
Per-action chargeback bills consuming teams for the unit of AI work they run; prompt runs, agent actions, retrieval calls; priced at input plus output tokens times volume, with a small platform overhead per action. The bill scales with consumption, giving the consuming team a real signal to optimize prompt length, model choice, and call frequency.
How should eval costs be allocated between platform and consumer teams?
A 50/50 split is the working default. The platform team owns the eval harness, baseline regression suite, and shared scaffolding; platform overhead. The consuming team owns the domain-specific eval set, rubric, and threshold; billed back as eval-action chargeback. Splitting evenly ensures neither side under-invests.
Why amortize the prompt registry rather than charge per prompt?
The prompt registry is shared infrastructure with strong network effects. Charging per prompt creates the wrong incentive; teams avoid registering to dodge the chargeback, fragmenting the library. Amortizing maintenance across many teams as a percentage of per-action chargeback preserves the incentive to register while distributing cost fairly.
What is the bursting buffer fund?
A reserve held by the AI platform team, funded by 3% to 5% layered onto most per-action charge, that absorbs short-term bursts. A team running a backfill, evaluation campaign, or model migration can spike its bill 10x; the buffer absorbs the spike and reconciles at quarter-end.
Who should own the chargeback model implementation?
Three implementations work. FinOps-style; finance, cloud cost tooling. IT-enabled; IT, layered onto existing seat-based chargeback with usage adders. AI-platform-team-led; the platform team, instrumented through the inference gateway. The right owner depends on which function has the most accurate usage data and the muscle to defend the price list.
How granular should per-action chargeback be?
Granular enough to be honest, coarse enough to be readable. Per-token billing is illegible. Per-action billing, with a published price per action class, translates token economics into the unit consuming teams understand. The platform team refreshes the price list quarterly.
Does per-action chargeback work for agents?
Yes. An agent run is a sequence of actions, and chargeback follows the action graph. The consuming team is billed for the sum of the agent’s prompts, retrieval calls, and tool calls. The per-action price is the same for chat sessions and autonomous agent runs.
How does chargeback interact with the eval bar?
Eval runs are charged like any other action set, with the eval-overhead split applied. Consuming teams see eval costs as a monthly line item, which gives a price signal on eval frequency and rubric design. The chargeback becomes the eval-discipline forcing function.
Key takeaways
- Per-seat chargeback breaks for AI because per-user cost variance is high (20x to 100x), quality is a function of ongoing eval maintenance, and consumption is not seat-bounded. No usage signal reaches consuming teams; no funding signal reaches the platform team.
- A working model has four components: per-action chargeback, eval-shared overhead (50/50 platform-consumer split), prompt-registry amortization, bursting buffer fund.
- Three implementations work: FinOps-style, IT-enabled, AI-platform-team-led. The choice depends on consumption scale, FinOps maturity, and platform-team operating maturity.
- The chargeback model is the operational expression of the feature-cost to evaluation-cost shift. Same alignment principles as vendor pricing; usage signal, quality signal, low gameability.
- Chargeback that does not reflect actual consumption is cross-subsidy. Cross-subsidy without a price signal produces under-optimization, eval-discipline degradation, and platform-team under-funding. The four-component model fixes many three.
The chargeback model is the budget-line version of the eval bar. Both encode quality and usage signal into the unit the organization optimizes against. Enterprises that get the chargeback right get the eval discipline right.
Arthur Wandzel