Cost guide

How to Cut AI Agent Costs by 98%: The Workflow Compiler Approach

By FlowStack · July 5, 2026 · 10 min read

Every time your AI agent runs, it thinks. It receives context, reasons about what to do, decides which tools to call, and orchestrates a response. That reasoning is powered by large language models — and LLM inference costs money. Run the same workflow 1,000 times a month and you pay for 1,000 rounds of full reasoning, even when the workflow logic never changes.

There is a better architecture: compile the reasoning once. That is the core idea behind workflow compilers like pflow.

98%
Cost reduction claimed by pflow vs traditional agents
~$0
Per-run LLM cost after initial compilation
Number of reasoning runs required (upfront)

Why AI agents are expensive

The cost structure of a traditional AI agent looks like this:

For a typical customer support agent handling 5,000 tickets per month, LLM reasoning alone can cost hundreds of dollars — before you add the cost of actually generating the responses.

The traditional vs compiled cost model

Let us use a concrete example: a daily sales report agent. It runs every morning, pulls CRM data, summarises deals, and emails the team. Same workflow every day.

Cost component Traditional agent (per run) Compiled workflow (per run)
Reasoning / routing tokens~2,000 tokens @ $0.003/1K = $0.006$0.000 (pre-compiled)
System prompt tokens~1,500 tokens @ $0.003/1K = $0.0045$0.000
Tool call parsing~3 steps × 500 tokens = $0.0045$0.000
Data summarisation (LLM)~800 tokens = $0.0024~800 tokens = $0.0024
Total per run~$0.018~$0.0024
Monthly (30 runs)~$0.54~$0.072
Monthly (1,000 runs)~$18.00~$2.40

In this example, a compiled workflow costs approximately 87% less per run than a traditional agent. For workflows with heavier reasoning steps, the gap widens to the 98% figure pflow reports.

What workflow compilation actually does

A workflow compiler like pflow treats the reasoning phase the same way a traditional compiler treats source code: it converts high-level logic into a lower-level, more efficient representation that can be executed without re-interpreting the source on every run.

Step 1 — Describe the workflow

You write a natural language description of what you want the agent to do. This is your "source code." You run it through the compiler once:

pflow compile "pull yesterday's closed deals from CRM, summarise by rep, email to team"

Step 2 — LLM generates the workflow plan (one-time cost)

The compiler calls an LLM to reason through the workflow — what steps are needed, what tools to call, what data flows where. This is the expensive step. It happens once.

Step 3 — Compiled .pflow.md is saved

The output is a .pflow.md file: a structured Markdown document containing the full workflow plan with explicit steps, tool calls, and conditionals. You commit it to version control.

Step 4 — Execute with near-zero LLM cost

From this point on, running the workflow executes the compiled file directly. The pflow runtime reads the step sequence and executes the tools. LLM calls only happen where genuinely dynamic output is needed — generating the actual email body, for example — not for routing decisions.

When does this approach work best?

The compile-once model delivers the highest savings on workflows that are:

It is less useful for:

Implementing this today with pflow

pflow is the most accessible implementation of this approach available today. The CLI is free:

uv tool install pflow-cli

Workflow compilation requires your own LLM API key (OpenAI, Anthropic, or compatible). The compilation cost — a one-time inference call — is typically under $0.05 for most workflows.

Running compiled workflows costs only what your tool calls cost. No additional LLM fees for routing or reasoning.

The bigger picture: AI cost architecture

Reducing per-run costs is one dimension of AI cost optimisation. Others include prompt caching (reusing KV cache across calls), model tier selection (using smaller models for simpler steps), and batching. Workflow compilation is the most impactful for recurring agentic workflows because it eliminates entire categories of LLM calls, not just reduces their cost.

As AI agents move from experimental to production, cost architecture will become as important as functionality. Tools that separate the "think once" phase from the "execute many" phase will define the next generation of efficient AI systems.

Get the Free FlowStack Toolkit

Includes an AI cost calculator, pflow quick-start templates, and a workflow compilation checklist.

Get the free toolkit →