Every time your AI agent runs, it thinks. It receives context, reasons about what to do, decides which tools to call, and orchestrates a response. That reasoning is powered by large language models — and LLM inference costs money. Run the same workflow 1,000 times a month and you pay for 1,000 rounds of full reasoning, even when the workflow logic never changes.
There is a better architecture: compile the reasoning once. That is the core idea behind workflow compilers like pflow.
Why AI agents are expensive
The cost structure of a traditional AI agent looks like this:
- System prompt tokens — loaded on every call (often 500–2,000 tokens)
- Reasoning tokens — the model thinking through what to do next (often 300–1,500 tokens per step)
- Tool call handling — parsing tool outputs and deciding next actions (another 200–800 tokens per step)
- Multi-step chains — a 5-step agent workflow multiplies all of the above by 5
For a typical customer support agent handling 5,000 tickets per month, LLM reasoning alone can cost hundreds of dollars — before you add the cost of actually generating the responses.
The traditional vs compiled cost model
Let us use a concrete example: a daily sales report agent. It runs every morning, pulls CRM data, summarises deals, and emails the team. Same workflow every day.
| Cost component | Traditional agent (per run) | Compiled workflow (per run) |
|---|---|---|
| Reasoning / routing tokens | ~2,000 tokens @ $0.003/1K = $0.006 | $0.000 (pre-compiled) |
| System prompt tokens | ~1,500 tokens @ $0.003/1K = $0.0045 | $0.000 |
| Tool call parsing | ~3 steps × 500 tokens = $0.0045 | $0.000 |
| Data summarisation (LLM) | ~800 tokens = $0.0024 | ~800 tokens = $0.0024 |
| Total per run | ~$0.018 | ~$0.0024 |
| Monthly (30 runs) | ~$0.54 | ~$0.072 |
| Monthly (1,000 runs) | ~$18.00 | ~$2.40 |
In this example, a compiled workflow costs approximately 87% less per run than a traditional agent. For workflows with heavier reasoning steps, the gap widens to the 98% figure pflow reports.
What workflow compilation actually does
A workflow compiler like pflow treats the reasoning phase the same way a traditional compiler treats source code: it converts high-level logic into a lower-level, more efficient representation that can be executed without re-interpreting the source on every run.
Step 1 — Describe the workflow
You write a natural language description of what you want the agent to do. This is your "source code." You run it through the compiler once:
pflow compile "pull yesterday's closed deals from CRM, summarise by rep, email to team"
Step 2 — LLM generates the workflow plan (one-time cost)
The compiler calls an LLM to reason through the workflow — what steps are needed, what tools to call, what data flows where. This is the expensive step. It happens once.
Step 3 — Compiled .pflow.md is saved
The output is a .pflow.md file: a structured Markdown document containing the full workflow plan with explicit steps, tool calls, and conditionals. You commit it to version control.
Step 4 — Execute with near-zero LLM cost
From this point on, running the workflow executes the compiled file directly. The pflow runtime reads the step sequence and executes the tools. LLM calls only happen where genuinely dynamic output is needed — generating the actual email body, for example — not for routing decisions.
When does this approach work best?
The compile-once model delivers the highest savings on workflows that are:
- Recurring — the same logical sequence runs repeatedly (daily, per event, per request)
- Deterministic in structure — the workflow steps do not change based on novel inputs
- High volume — even a $0.01 per-run saving is significant at 10,000 runs/month
It is less useful for:
- One-off research tasks where the workflow path is genuinely novel each time
- Exploratory conversations where the agent needs to improvise
Implementing this today with pflow
pflow is the most accessible implementation of this approach available today. The CLI is free:
uv tool install pflow-cli
Workflow compilation requires your own LLM API key (OpenAI, Anthropic, or compatible). The compilation cost — a one-time inference call — is typically under $0.05 for most workflows.
Running compiled workflows costs only what your tool calls cost. No additional LLM fees for routing or reasoning.
The bigger picture: AI cost architecture
Reducing per-run costs is one dimension of AI cost optimisation. Others include prompt caching (reusing KV cache across calls), model tier selection (using smaller models for simpler steps), and batching. Workflow compilation is the most impactful for recurring agentic workflows because it eliminates entire categories of LLM calls, not just reduces their cost.
As AI agents move from experimental to production, cost architecture will become as important as functionality. Tools that separate the "think once" phase from the "execute many" phase will define the next generation of efficient AI systems.