Home About Who We Are Team Services Startups Businesses Enterprise Case Studies Blog Guides Contact Connect with Us
Back to Guides
Enterprise Software 12 min read

The Case for Buying Your Model Gateway and Building Your Prompt Library

The Case for Buying Your Model Gateway and Building Your Prompt Library

Two layers of the AI stack sit immediately adjacent to each other and have opposite sourcing verbs in 2026: the model gateway is unambiguously a buy, and the prompt library is unambiguously a build. The model gateway; the routing, observability, and rate-limiting layer between application code and foundation model APIs; implements a feature set that is identical across organizations. Building it is reinventing a commodity. The prompt library; the version-controlled, evaluated, parameterized prompts the org’s AI features run against; is workload-specific and is where the actual product work has accumulated. Buying it would mean accepting another org’s prompts on a different workload, which produces wrong outputs. The split is one of the cleanest worked examples of the Pillar 3 default; buy the rails, build the moat; and the one most often gotten backwards. This piece argues the case for the split, names the threshold at which each layer becomes worth doing, and explains why most orgs end up with a custom gateway nobody wanted and a prompt library that exists nowhere coherent.

This is a spoke under the AI build-vs-buy-vs-hire decision matrix for 2026. The matrix’s eighth principle; compose: buy the rails, build the moat, hire the judgment; is the abstract version of the argument; this piece is the concrete worked instance for the gateway and prompt library specifically.

Why these two layers, why now

By 2026, the AI infrastructure stack has stabilized enough that most layers have a clear default sourcing verb. Foundation models are buy. Vector indices are buy. Eval harnesses are scaffolding for build. Observability backends are buy. Agent frameworks are buy. The pattern is consistent: layers that are commodity get bought; layers that encode org-specific work get built.

The model gateway and the prompt library are both visible at the same architectural level; both touch most AI feature, both are owned by the AI platform team, both show up in the same architecture diagrams. Their proximity creates the temptation to source them the same way. That temptation is the error.

They are at the same level but they have opposite moat density. The gateway is plumbing; the prompt library is product. The right move is to source each by its own moat density, not by their architectural adjacency. We covered the underlying logic in stop building AI plumbing, buy the rails, build the moat; this piece names the worked instance.

The model gateway is plumbing

A model gateway sits between application code and foundation model APIs. Its job is unified routing across providers (OpenAI, Anthropic, Google, AWS Bedrock, open-weights models hosted internally), retry and fallback logic when a provider is degraded, rate limiting per team or per feature, observability of most call (latency, tokens, cost), and spend caps the finance team can enforce.

The feature set is identical across organizations. Most gateway implementation has the same primitives, the same provider adapters, the same observability schema, the same rate-limit semantics. The differences between LiteLLM, Portkey, and Helicone are matters of UI, cost, and deployment model; they are not differences in what the gateway is doing.

Building a model gateway in-house is a 6-to-12-engineer-month project that produces a worse version of an open-source tool that already exists. The org’s engineers spend that time on plumbing nobody differentiates on, while the prompt library; which is where the differentiation lives; does not get built.

The two costs of building rather than buying:

  • The direct cost of 6 to 12 engineer-months of work
  • The opportunity cost of those engineers not working on the prompt library, the orchestration logic, the eval suite, or the agent-specific tooling that does differentiate

For most orgs the second cost is the larger one. The gateway is the easiest thing to build because the requirements are clear; orgs that build it are often selecting on what is buildable rather than what is valuable. The verb is buy. The 2-to-4-week implementation produces a gateway that does the job; the engineering team turns to actual moat work.

The prompt library is product

A prompt library is the version-controlled, evaluated, parameterized collection of prompts the org’s AI features run against. The library has four components:

  1. The prompts themselves; text plus templating logic, organized by feature
  2. The eval cases attached to each prompt; test inputs, expected outputs or expected properties, scoring rubric
  3. The version history; most change to most prompt, with the eval scores that justified the change
  4. The deployment metadata; which prompt version is in production, which is staged, which is deprecated

Most component is workload-specific. The prompts encode the org’s specific way of asking for invoice extraction, customer-issue classification, code review, document summarization, or whatever else the AI features do. The eval cases are anchored to the org’s specific test data. The version history is a record of the org’s specific iteration journey. The deployment metadata maps to the org’s specific release cadence.

None of that transfers across organizations. Buying a prompt library means importing prompts that were written for someone else’s workload, which produces outputs scored against someone else’s quality bar. The output is approximately correct on demos and approximately wrong on production traffic, which is the worst of both worlds because the failures look like model failures rather than sourcing failures.

The right verb is build. The prompts and the eval cases are the org’s actual AI work product, more so than the orchestration code that calls them. We argued this in stop paying AI agencies for documentation, pay them for evals: the eval suite is the deliverable that survives the engagement, and the prompt library is the artifact the eval suite tests.

Why the inversion happens so often

The natural failure mode is the inversion: building the gateway and not building the prompt library.

Building the gateway happens because gateway requirements are clear and gateway code is satisfying to write. The team can scope it (“we need provider routing, retries, observability”), build it, and demo it. The output is a clean component with a known interface. Engineering culture rewards that kind of work.

Not building the prompt library happens because the prompt library has fuzzy requirements and unsatisfying maintenance overhead. The prompts live in some Python file. The eval cases live in some Notion doc. The version history is git blame. The deployment metadata is implicit in whichever feature flag is currently on. There is no clean component to demo. Engineering culture treats this as not-real-work, even though it is the actual product.

The result, six months in, is an org with a beautiful internal gateway that nobody competes on, and 47 prompts scattered across 12 repos with no consistent eval and no version history. The gateway is overbuilt and the prompt library is underbuilt. The fix is the inversion of the inversion: buy the gateway in 2 to 4 weeks, redirect the freed engineering time to building the prompt library properly.

What the right architecture looks like

The right shape, drawn out:

LayerVerbTool / Pattern
Foundation modelBuyOpenAI, Anthropic, Google, etc.
Model gatewayBuyLiteLLM (self-hosted) or Portkey / Helicone (SaaS)
Prompt library: storageBuild (with bought UI for non-engineers)Code library in monorepo + optional product
Prompt library: evalBuild or hirePromptfoo / Inspect harness, org-written cases
Prompt library: version historyBuild (use git)Standard version control
OrchestrationBuildLangGraph / Vercel AI SDK as rails
ApplicationBuildStandard product engineering

The prompt library row deserves a closer look. The storage layer benefits from a tool; prompt-management products handle versioning, UI for non-engineers, and a clean separation from application code. That part can be bought. What cannot be bought is the eval cases, because those encode the workload. The split inside the prompt library mirrors the broader gateway-vs-library split: storage is plumbing, eval is product.

The model selection problem the gateway makes possible; testing one prompt across three models and picking the cheapest one that meets the eval bar; is covered in detail in build, buy, or fine-tune: a decision frame for foundation model choices.

Thresholds and exceptions

The argument for buying the gateway holds at most scales but has thresholds.

Below a certain scale; single provider, fewer than five teams using AI, no per-team cost attribution requirement; the gateway is overhead. The provider SDK works fine; the gateway adds operational surface without adding value. The threshold at which a gateway becomes worth its cost: more than one provider, or more than five teams, or a finance team asking for per-team attribution. Below that, skip it.

Above a certain scale; extreme custom routing requirements, specific compliance constraints, on-prem-only deployments with bespoke security policies; the gateway becomes harder to buy. Open-source LiteLLM with org-specific patches is usually still the right answer; rebuilding from scratch rarely is.

The argument for building the prompt library has fewer exceptions. The library is workload-specific by definition; there is no scale at which it becomes worth buying. The closest thing to an exception is the -early-stage startup whose product is two prompts on top of a foundation model; at that scale, “library” is two files and the formal structure is overhead. As soon as the product has 10+ prompts in production, the library is real and the build is on.

Frequently asked questions

What is a model gateway?

The routing, observability, and rate-limiting layer between application code and foundation model APIs. It provides a unified interface across providers, handles retries and fallbacks, exposes spend dashboards, and enforces per-team rate limits. Open-source examples include LiteLLM; commercial examples include Portkey and Helicone. The gateway is plumbing.

What is a prompt library?

The version-controlled, evaluated, parameterized collection of prompts the org’s AI features run against; prompts plus eval cases plus version history plus deployment metadata. The library is workload-specific. The prompts are the org’s actual AI work product, more so than the orchestration code.

Why is the gateway a buy and the prompt library a build?

Opposite moat density. The gateway implements a known feature set identical across organizations; building it is reinventing a commodity. The prompt library is workload-specific work; buying it means accepting another org’s prompts on a different workload, which produces wrong outputs.

Why not just use the SDKs directly?

The SDKs work fine until the org has more than two providers, more than five teams using AI, or any cost-tracking requirement crossing team boundaries. Past that threshold the gateway is doing real work; unified rate limiting, per-team cost attribution, fallback routing when a provider is degraded.

Should the prompt library be a tool or a code library?

Both. Storage and versioning is best handled by a code library inside the monorepo so prompts ship with calling code. Eval and observability benefits from a tool (Promptfoo, Langfuse). Code for the prompts; tool for the eval suite.

What about prompt management products?

They handle storage, versioning, UI for non-engineers; real value. They do not handle workload-specific eval cases that make the library function as a moat. Use them for the surface; build the eval suite separately.

What if the org is too small to need a gateway?

Skip it initially and use provider SDKs directly. The threshold: more than one provider in production, more than five teams sharing AI infrastructure, or finance asking for per-team cost attribution. Below that, the gateway is overhead.

Should the prompt library be open within the company?

Yes, internally. The library compounds when teams can read and contribute across boundaries; one team’s payment-extraction prompt seeds another’s invoice-processing prompt. Hiding it kills the compound.

Can the gateway be self-hosted versus SaaS?

Both viable. LiteLLM is open-source and self-hostable; Portkey and Helicone offer self-hosted tiers. The decision is the standard SaaS-vs-self-hosted one; both still “buying the gateway.”

How does this align with the Pillar 3 manifesto?

The third principle says orchestration is the build layer; the library stores its content. The fourth says eval is build or hire, rarely buy. The eighth says buy-the-rails-build-the-moat. The gateway is rails. The library is moat.

Key takeaways

  • The model gateway is plumbing; provider routing, observability, rate limiting; and is identical across organizations; the verb is buy.
  • The prompt library is product; workload-specific prompts, eval cases, version history, deployment metadata; and is the org’s actual AI work; the verb is build.
  • Two adjacent layers, opposite verbs. The error pattern is sourcing them the same way and getting both decisions wrong.
  • The natural failure mode is building the gateway (because requirements are clear and demo cleanly) and not building the prompt library (because requirements are fuzzy and maintenance feels unrewarding).
  • The architectural fix: buy LiteLLM or Portkey or Helicone in 2 to 4 weeks; redirect the freed engineering capacity to building the prompt library, with code for prompts plus a tool for eval.

The split between gateway and library is the single cleanest expression of the Pillar 3 default. It is also the split most commonly gotten backwards, because the gateway is satisfying to build and the library is unrewarding to maintain. The orgs that get it right ship faster, differentiate harder, and free engineering capacity for the work that distinguishes their AI product.

Last Updated: Jun 17, 2026

AW

Arthur Wandzel

SFAI Labs helps companies build AI-powered products that work. We focus on practical solutions, not hype.

See how companies like yours are using AI

  • AI strategy aligned to business outcomes
  • From proof-of-concept to production in weeks
  • Trusted by enterprise teams across industries
Get in Touch →
No commitment · Free consultation

Related articles