axagent experiencelivev0.31.0
cost routing · since v0.27

Your frontier model is doing intern work.

You’d think Claude Code already sends the routine work it spawns to cheaper models. It doesn’t — every sub-task runs on your most expensive model unless something tells it which one to use, and your weekly usage limit dies in hours, not days. ax measures the leak, warns as it happens, learns the fix from your own history, and verifies the savings — all on your laptop.

what the leak looks like on one real machine running ax · 14 days of receipts

$19,270agent spend · 14 days
663sub-tasks spawned
75%ran on the expensive default
28:1expensive-to-cheap spend

$2,301 of sub-task spend on fable/opus vs $83 on sonnet, on this machine. Run ax cost split to see yours.

01 / The loop

Measure. Nudge. Tune. Verify.

Four commands close the loop — see where the expensive defaults hide, get warned before the next one, learn the fix from what your agents actually do, then reprice against the tokens your sub-tasks actually burned.

01measure
ax cost split --days=7

Breaks your spend into main session vs the sub-tasks your agent spawns, by model. The 28:1 above came out of this table — run it to see your ratio.

02nudge
ax hooks install ~/.ax/hooks/route-dispatch.ts --providers=claude

The route-dispatch hook warns in the moment, right as a routine sub-task is about to run on the expensive default.

03tune
ax routing tune

Finds the routine work you keep overpaying for, from what your agents actually do. Deterministic pattern-matching — no AI guessing.

04verify
ax dispatches --candidates

Reprices every overbilled sub-task against what the cheaper model would have cost, from the actual tokens it burned. Your savings, not projections.

02 / The receipts

One machine, verbatim.

Over 30 days: 20 patterns of routine work found, $591.57 of addressable spend. Reprice the last 14 days against the cheaper model and $512.91 is recoverable — the same machine as the numbers above. ax routing tune --dry-run prints yours in one command.

~/Projects/axzsh
$ ax routing tune --dry-run --days=30

20 proposals  addressable spend: $591.57  (30 days)
apply non-judgment: ax routing tune --days=30   brief: ax routing tune --emit-brief

$ ax dispatches --candidates --days=14

total est savings: $512.91
top classes: well-specified-impl ($222.78), spec-review ($69.75), bug-fix ($62.08)

Honest numbers, on purpose: “addressable spend” is what the flagged sub-tasks actually cost over the window — yours included. ax reprices what already happened, from the actual tokens burned, and never reports fabricated projected savings.

03 / The safety rule

Judgment work never tiers down.

Your obvious objection: won’t quality drop? No — ax refuses to auto-route anything that needs taste. Your reviews, your design calls, your plans stay on the frontier model.

tiers down automatically

  • File search & repo recon (billed at opus rates today)
  • Well-specified implementation (the spec did the thinking)
  • Bug fixes with a repro (mechanical once located)
  • Verification & test runs (pass/fail, not judgment)

stays on the frontier model

  • Code review (quality gate, never auto-routed)
  • Design & UX (taste does not tier down)
  • Planning & architecture (the expensive mistakes)
  • Audits (the thing checking the cheap work)
Judgment proposals ship as a brief, not a change. When ax spots judgment-shaped work, it ships the proposal as a written brief your agent stress-tests against your own history before anything applies. Quality stays on the frontier model; only routine work moves.

Route the expensive model where it earns its keep.

Everything runs local. Your transcripts never leave your machine.

$ curl -fsSL https://ax.necmttn.com/install | bash$ ax ingest # first run: build the graph from your history$ ax routing tune