What it actually costs to embed GenAI in Workday Extend

Most teams arrive at this conversation having read a vendor blog post with eye-watering numbers and a procurement scare story about enterprise AI bills running into hundreds of thousands of euros. Both can be true at extreme volumes. They are not the typical case for Workday Extend, where the AI is usually embedded inside a specific HR or finance process with bounded prompts and predictable usage.

There are a handful of viable paths to GenAI inside Workday Extend, and they have very different cost profiles. Knowing which one you are taking is the first step in any honest estimate.

The paths into GenAI, and which one you are actually on

Path 1: Workday's own AI Gateway. Available with Extend Professional, not Essentials. It is Workday's curated set of AI services, including document intelligence, question and answer over policy content, sentiment analysis, skills cloud integration, and machine learning forecasting. You do not pay per token. The capability is included in your Extend Pro entitlement. The trade-off is that you are using Workday's models for Workday's use cases, which works well when those use cases match yours and not when they do not.

Path 2: AWS AI services through Extend Pro. Also Pro-only. Direct integration with Amazon Rekognition, Textract, Comprehend, Translate, Lambda, and EventBridge. You pay AWS rates, charged through a managed AWS account that Workday provisions. Useful when you need image recognition, document parsing, or translation at scale. Costs are usage-based but generally modest for typical HR use cases.

Path 3: External large language models. Calls to OpenAI, Anthropic, Google, or whichever LLM provider you have an enterprise relationship with. Available from both Essentials and Pro. You manage the API keys, the prompts, the safety checks, and the cost. This is the path most clients ask about because it is the one where the marketing noise is loudest.

Essentials or Professional? A short detour

Three things are gated behind Extend Professional that matter for AI work. The AI Gateway. AWS native integration. The Extend Developer Co-Pilot, which uses GenAI to speed up UI and code creation. You also get more production apps, more tenants, and more named contacts.

A practical way to read the choice. If you are testing the platform with a couple of simple internal apps and no real AI ambition, Essentials is fine. If you have any serious GenAI use case on your roadmap, or you are looking at three or more Extend apps in the next eighteen months, Professional is a near-automatic decision. It is likely that Workday will add agent creation to Pro in a future release, which further tips the scale.

How token pricing actually works

External LLMs charge per token. A token is roughly three quarters of a word, with some variation by language and content type. The prices on every provider's website are quoted per million tokens, and they split into two numbers: a price for input tokens (the prompt you send) and a price for output tokens (what the model writes back). Output is usually three to five times more expensive than input.

As of mid-2026, the public per-million-token pricing from major providers ranges roughly as follows. Frontier models from Anthropic, OpenAI, and Google sit in the 15 to 75 dollar range for output and 3 to 15 dollars for input. Mid-tier production models sit around 3 to 5 dollars for input and 10 to 20 dollars for output. Smaller, faster models sit at well under one dollar for input and a few dollars for output. The cheaper "mini" models from OpenAI, Anthropic, and Google have come down to fractions of a cent per thousand tokens and remain surprisingly capable for narrowly-scoped HR tasks. Check each provider's current pricing page before you commit numbers to a business case, the floor keeps dropping.

The picture that matters: the cost gap between the largest and smallest models is now roughly fifty to one. Picking the right tier for the task is the biggest cost lever you have. A goal-drafting assistant rarely needs the frontier model.

A worked estimate

Take a realistic example. An Extend app that helps managers draft performance review summaries by pulling check-in data and producing a one-paragraph draft. Usage assumptions: 5,000 employees, two prompts per employee per annual cycle, average input of 800 tokens and output of 300 tokens. Total tokens per cycle: 5,000 times 2 times (800 + 300) which is 11 million tokens per year.

With a mid-tier model, very roughly: 8 million input tokens at 3 dollars per million plus 3 million output tokens at 15 dollars per million. That is about 24 plus 45, around 70 dollars per year. Even at frontier pricing, the bill stays well under a thousand dollars. With a mini model, it is closer to 5 dollars.

The lesson is not "AI is free", it is that the bill scales with how often you prompt and how big the prompts are. For bounded HR processes with reasonable prompt design, the cost rarely dominates the business case. The build effort and the change management almost always do.

Now scale it up. Imagine an always-on conversational HR agent at a 50,000-employee organisation, making roughly 10 prompts per employee per month, averaging 1,500 input tokens and 800 output tokens per call. That is 500,000 prompts per month, or 6 million per year. Input volume: 6 million times 1,500 equals 9 billion input tokens per year. Output volume: 6 million times 800 equals 4.8 billion output tokens per year. At frontier pricing of 3 dollars per million input and 15 dollars per million output, that is 9,000 times 3 plus 4,800 times 15, so roughly 27,000 plus 72,000, around 99,000 dollars per year. Move that same workload to a mid-tier model and the bill drops by a factor of three to five. Move it to a mini model and you are back inside a few thousand dollars. Token cost only becomes a finance conversation at this kind of scale, and even then the lever is model choice, not whether to do it at all.

One footnote worth raising in the business case. Calling external LLMs from inside Extend may incur additional Workday egress or per-call charges depending on how the connection is routed (direct outbound HTTP, AI Gateway proxying, or a managed integration), so confirm with your Workday account team before signing off the unit economics.

Where the bill actually grows

A few patterns push cost beyond the comfortable range. Large input contexts, where you are summarising 50-page documents repeatedly and the input token count balloons. Conversational agents that hold long context across many turns, so the total token count compounds across the conversation. Retrieval-augmented patterns that re-send a knowledge base on every call instead of caching the retrieval embeddings. And free-form user prompting where token length is uncapped and unpredictable.

All of these are solvable with good engineering. None of them are inherent to GenAI in Workday Extend. They are inherent to lazy implementation, anywhere.

Cost-control levers that actually work

A handful of operational levers do most of the work in keeping the bill predictable.

Pick the right model for the task. The mini and Haiku-class models are sufficient for most HR text generation work. Reserve the frontier models for tasks that genuinely require reasoning depth.

Cap the output. Set a maximum response length. For a feedback nudge, 100 tokens is plenty. Without a cap, models will sometimes produce 500-token answers when 50 would do.

Cache repeated calls. If the same prompt produces the same answer (department onboarding nudges, generic policy summaries), cache the result. Most enterprise LLM platforms support some form of prompt caching natively now.

Embed the prompt in the process, not the user input. Letting users write free-form prompts inflates token usage and produces unpredictable results. The pattern we recommend: the user clicks a button, the backend assembles the prompt with their data, the user never sees or controls the prompt itself. This gives you cost control and quality control in one move.

Monitor usage from day one. Whatever your provider, set up dashboards for token consumption per use case, per user group, and per time period. Anomalies surface fast. A user who discovered a prompt loop, or a prompt that pulled in too much context, will show up before the monthly bill does.

“The bill scales with how often you prompt and how big the prompts are. Pick the right model and bound the input, and the cost rarely dominates the business case.”

Where AI Gateway makes more sense than external LLMs

A pragmatic note we make to most clients. If your use case maps onto what AI Gateway already does (document intelligence on uploaded PDFs, sentiment analysis on survey responses, headcount or financial forecasting), use AI Gateway rather than rolling your own with an external LLM. You get a Workday-supported, Workday-secured capability, included in your Pro license, with no token bills to manage and no separate data flow to worry about.

External LLMs come into their own when the use case is bespoke: assistive content generation for performance reviews, custom domain summarisation, conversational interfaces over your own data. The setup work is real but the unit economics are usually better than the marketing scare stories suggest.

What to put in the business case

A finance team will accept a token-cost estimate if it is built from three numbers: expected prompts per period, average tokens per prompt, and provider pricing with a sensible buffer. Add 20 to 30 percent for variation and call it an operational cost line, not a capital expense. Avoid the temptation to forecast costs from headline pricing alone. The real-world unit cost on most HR use cases sits well below the worst-case quote.

One more line that helps: a sentence on which AI governance board has reviewed the use case, what data is being sent, and what mitigations are in place (no personally identifiable data leaving the tenant, audit logs, prompt review). Token cost is the easy question. Internal AI governance is where Extend projects actually slow down, and addressing it in the business case removes a likely blocker.