axagent experiencelivev0.31.0

what ax actually does

Every scenario, one graph.

Concrete demos of what your local ax instance already exposes. Each one is something you can run today, on your own history.

Backtest a hook against history. Search every session you've ever had. See where your tokens go. Watch a verdict earn its place at +30 sessions. Route the intern work to cheaper models. Keep your plan budget in view. Take proposals mined from your own transcripts. Find out which sessions thrash. Publish your receipts and hand the graph to an agent.

before-you-ship · cases sample output

Ask the graph what your hook would have caught.

You write a guardrail. You don't know if it'll catch real mistakes or just become noise. ax hooks cases scores the candidate against labeled cases from your own session history - true and false positives, a real precision number - so the decision to ship is evidence, not vibes.

~/.ax/hooks/main-branch-guard.tscandidate
import { defineHook, Verdict, GitEnv } from "@ax/hooks-sdk";

export default defineHook({
  name: "main-branch-guard",
  events: ["PreToolUse"],
  matcher: { tools: ["Bash"] },
  run: (event) =>
    Effect.gen(function* () {
      const cmd = event.tool?.input.command ?? "";
      if (!/^git (push|commit)\b/.test(cmd)) return Verdict.allow;
      const branch = yield* (yield* GitEnv).currentBranch(event.cwd);
      if (/^(main|master|production)$/.test(branch ?? ""))
        return Verdict.block("direct write to protected branch");
      return Verdict.allow;
    }),
});
PreToolUse · Bash16 lines · gut-check before the replay
ax · hooks cases · ~/Projects/ax
~/.claude $ ax hooks cases main-branch-guard --since=7
  ↳ replay window   2026-05-21 → 2026-05-28  (7d)
  ↳ sessions        14 claude_code, 3 codex  (17 total)
  ↳ tool_calls      1,247 bash invocations indexed
 
  replaying… ████████████████████ 1247/1247  4.2s
 
  ───────────────────────────────────────────────────────────
  verdict          SHIP · HIGH-CONFIDENCE
  ───────────────────────────────────────────────────────────
  fires             12 / 1,247 calls  (0.96%)
  ├─ true positives  11  would have blocked actual main-branch pushes
  └─ false positives  1  legitimate release → main · 2026-05-24
 
  precision         0.917   recall 0.917   F1 0.917
  prevented rollbacks  5     (traced via post-event reverts)
 
  by repo
    ~/Projects/api       8  ▮▮▮▮▮▮▮▮
    ~/Projects/web       3  ▮▮▮
    ~/Projects/infra     1   ← false positive lives here
 
  one to review:
    sess_8af3·turn-42  release/v2-cutover  → allow-list?
 
  install with: ax hooks install ~/.ax/hooks/main-branch-guard.ts --providers=claude,codex
~/.claude $ 
replay window · 17 sessions · 1,247 bash calls2026-05-21 → 2026-05-28 · 12 fires · 5 rollbacks prevented
Thu 213 sessions
163 calls
Fri 222 sessions
131 calls
Sat 231 session
86 calls
Sun 242 sessions
112 calls
Mon 253 sessions
221 calls
Tue 263 sessions
248 calls
Wed 272 sessions
177 calls
Thu 281 session · today
109 calls
pass · normal traffic would have blocked · true positive traced to a later rollback false positive · review

search the graph

Find what you shipped last time you did this.

Every transcript ax has ever ingested is full-text searchable - Claude Code, Codex, every turn, every tool call, every reasoning text. Ranked excerpts come back with the session, the file, the commit, and whether it stuck.

14,832 turns412 sessionsclaude + codex
4 matches · 38 ms
010.94

Built the OAuth refresh token rotation. The middleware now checks expiry with <= not < after the bug we hit last quarter - tests cover the boundary tick and the clock-skew window.

claude code·session 5a8e9c·2026-05-21 · 14:02·~/Projects/ax·src/auth/middleware.ts
→ shipped in8b3d1f4adoptedt + 7d
020.81

PR #847 - OAuth refresh path. Tests cover both expiry edge cases; the middleware guards against double-refresh by holding a per-tenant lock for the duration of the rotation.

claude code·session 3c1d22·2026-04-14 · 09:48·~/Projects/ax·src/auth/refresh.ts
→ shipped in2e0a5ccadoptedt + 30d
030.72

Initial OAuth wiring. Note for future me: don't reuse the access token endpoint for refresh - separate route, separate rate limit, separate audit log.

codex·session 7f4b88·2026-03-02 · 22:11·~/Projects/ax·src/auth/routes.ts
→ shipped in9d1e0a2adopted · lockedt + 90d
040.41

Spike on OAuth session-binding inside the middleware - rejected, returned to the PR #420 approach. Leaving the diff in scratch/ in case the threat model changes.

claude code·session 1a2b33·2025-12-08 · 16:30·~/Projects/ax·scratch/oauth-bind.ts
→ rolled backrejectedt + 2d

ax · local taste & telemetry graph · prototype

see the bleed · token-impact

Where your agent context goes.

Every agent user is bleeding money on cache misses they can't see. ax insights token-impact --since=7d joins your local claude + codex transcripts, reconciles provider metadata against transcript bytes, and shows the spend, the hit rate, and the workflows burning the budget.

tokens · 7d
14.2M
▲ +20%  vs 11.8M last week
claude 8.5M · codex 5.7M
spend · 7d
$42.18
claude $24.18codex $18.00
cache hit · 7d
67%
▼ -4pp WoWtarget 80%

By workflow epoch & expensive sessions

join: session_token_usage ⋈ session_health
gsd
42%
superpowers
31%
ad-hoc
27%
cachedcache miss (paid)

Bar length = share of total tokens. Color split inside each bar = cached vs. paid for the same workload. ad-hoc is half the tokens of gsd but burns more dollars - fewer rituals, lower cache hit.

session 9c2e44 · claude
2.40M tk
~/Projects/ax src/ingest/transcripts.ts refactor
cache hit 41%$7.81 · 14 turns
session 4f1ab0 · codex
1.85M tk
~/Projects/ax insights CLI scaffold
cache hit 58%$5.94 · 22 turns
session a07e91 · claude
1.31M tk
~/Projects/ax schema v3 migration
cache hit 79%$3.12 · 9 turns
session 2bf330 · codex
1.07M tk
~/Projects/ax docs/landing rewrite
cache hit 36%$3.78 · 31 turns
session 7d4c12 · claude
0.92M tk
~/Projects/api live-traces vendor
cache hit 74%$2.34 · 11 turns
3.2×
codex burns 3.2× the context of claude code for equivalent work - same workflow_epoch, same repo, same outcome. Most of that is restated history per turn.workflow-impact says the gsd → superpowers migration is paying off · run ax insights workflow-impact for the cohort comparison
where the numbers come from
ax reads provider metadata - cache_creation_input_tokens, cache_read_input_tokens, input_tokens, output_tokens - and falls back to transcript-byte estimates when a turn predates cache reporting.
runs on your machine
Local SurrealDB instance. Typed Effect pipeline. No outbound calls, no upload. Sibling diagnostics: cache-healthworkflow-impactskill-impact

the compounding part

Every change earns its place by session 30.

Accepting a proposal doesn't make it true. ax turns each acceptance into an experiment with three forward-looking checkpoints — t+3, t+10, t+30 sessions — and watches the next runs to see if the change actually held. Days are the wrong unit when an agent ships eight sessions a day. The verdict at t+30 sessions is locked. Future proposals know.

Fig · S-04verdict timeline · post-feature-verify
acceptexperiment opened
exp_id post-feature-verify · t0
marker added · src/cli/run.ts:42
watching marker · file · pattern · tests
pending

ax doesn't trust the moment you accept — it earns the verdict by watching what happens across the next 30 sessions. Marker still landed? File still healthy? Pattern not recurring? Tests still green? Each checkpoint joins evidence from the same graph that generated the proposal. Sessions, not days — a weekend doesn't artificially delay; a productive afternoon doesn't artificially rush. The verdict at +30 sessions is locked and feeds the next round. Verdicts live in the improve queue — ax improve verdict confirms or overrides one from the CLI.

recent experiments5 of 47

  • post-feature-verify+30 sessmarker landed · 0 rollbacks · 1 dependentadopted
  • main-branch-guardrail+10 sessmarker landed · 2 of 4 callsites bypassedpartial
  • skill-ts-default+3 sessawaiting first signal · 1 session remainingpending
  • ingest-regression+30 sesspattern not recurred over 30 sessions · tests greenadopted
  • cache-warm-on-start+10 sessadded 800ms cold start · reverted at session 6regressed
verdict states ›adoptedregressedpartialignoredno_longer_needed

route the intern work · dispatches

Stop paying frontier rates for mechanical dispatches.

Every sub-task your agent spawns inherits your most expensive model unless something says otherwise. ax dispatches --candidates finds the dispatches that ran on fable or opus but matched a mechanical routing class - and reprices each one against the cheaper model, from the tokens it actually burned.

biggest single receipt
$35.18
one dispatch · $50.26 on inherit → sonnet
"Implement Task 3: session map strip"
redirectable · last 2d
$209.59
39 model-less dispatches on fable/opus
matched mechanical routing classes
where the fix fires
2harnesses
route-dispatch hook · at dispatch time
claude code + codex

Top candidates, repriced

ax dispatches --candidates --days=14
tsagent_typedescriptionsuggestchild costest savings
06-10 13:30general-purposeImplement Task 3: session map stripclaude-sonnet-4-6$50.26$35.18
06-11 07:41general-purposeFix ingest run lifecycleclaude-sonnet-4-6$30.98$21.69
06-10 07:09general-purposeAdd deep span instrumentationclaude-sonnet-4-6$26.41$18.49
06-10 15:32general-purposeImplement P2-T16 skillsclaude-sonnet-4-6$16.29$11.40
06-11 07:42general-purposeSweep stale 8520 port refsclaude-haiku-4-5$8.56$7.70
06-12 06:44codebase-analyzerExtract contracts for planclaude-sonnet-4-6$6.75$4.73
top 6 of dozens of candidates in 14d · $99.29 est. savings on these rows alone · "inherit" means no model was specified, so the dispatch rode the expensive default
01find
ax dispatches --candidates

Inherited an expensive model + matched a mechanical class. Each row carries a suggested model and the dollars it would have saved.

02compile
ax routing compile

Writes the class table to ~/.ax/hooks/routing-table.json - merge-preserving, your own classes survive a regenerate.

03fire
route-dispatch hook

Suggests the cheaper model at dispatch time, in Claude Code and Codex. The next "Fix ingest run lifecycle" rides sonnet, not fable.

tune ax routing tune mines the unmatched expensive dispatches into new classes - two-token prefix clustering, ≥3 members. Mechanical classes auto-apply; judgment-flagged ones (review / design / plan / audit) only ship via an emitted brief and an agent backtest.

where the numbers come from
Every dispatch row joins the parent tool_call to the child session it spawned. Savings are repriced from the tokens the child actually burned - not a projection, a receipt.
there's a whole page on this
The leak, the loop, and 30 days of verbatim receipts from one machine: ax · routing →

measure + tune, live

Your bill, broken out and tunable.

ax studio's /cost view renders the same numbers the CLI prints — the main-vs-subagent spend split, per-model cost, and the dispatch candidates worth routing down — live off your local graph. And routing is regex underneath, so it ships an interactive tuner: edit a class pattern, watch which past dispatches it catches (and which it shouldn't), flag false positives into an exclude list, and save — the route-dispatch hook picks it up live.

ax studio /cost view: main-thread routability bars and the interactive routing tuner with an editable regex pattern, suggested model, and exclude patterns over real dispatch history
ax studio · /cost — main-thread routability and the interactive routing tuner

know the envelope · quota

Your plan limits, live, everywhere you look.

Claude tells you about your usage limit when you hit it. ax quota reads the same usage endpoint the Claude app does - your 5-hour and 7-day rolling windows, live, with the OAuth token you already have. No new login, no DB, nothing leaves your machine but the one call Claude already makes.

5h64%resets 04:29
7d63%resets 04:59
7d sonnet5%resets 04:59
extraoff · no overage billing past the windows

One cached read, three surfaces

~/.ax/quota-cache.json · 60s ttl
terminalax quota
~ $ ax quota

window       used  resets
5h            64%  04:29
7d            63%  04:59
7d sonnet      5%  04:59
extra         off

(fetched 0s ago, live)
claude code statuslineax quota --statusline
~/Projects/ax · sonnet-4-65h 64% → 04:29 · 7d 63%

One plain line for the statusLine command. Poll every render - it's the cache answering, not the API.

macOS menubarax quota --swiftbar

A SwiftBar/xbar plugin body - the burn rate lives next to the clock. Fetch failures degrade to the stale cache, never a crash in the menubar.

where the numbers come from
The same api.anthropic.com/api/oauth/usage endpoint the Claude app polls, read with your existing Claude Code OAuth token - macOS Keychain first, ~/.claude/.credentials.json fallback. ax never refreshes the token.
runs on your machine
No SurrealDB involved at all - this is the one ax command with zero graph. Responses cache at ~/.ax/quota-cache.json (60s TTL) so statusline and menubar can poll freely without hammering the endpoint.

the graph talks back · improve from our own graph · 2026-06

Proposals mined from your own transcripts.

ax improve recommend scores improvement proposals out of your transcript graph - each one with an evidence trail and a backtested projected value. Accept one and it becomes a brief an agent acts on. Lint reconciles what actually got applied. Verdicts confirm it or retire it.

17.49hookhook__17b5aaf6aade53e5high · 39/wkorigin: system

Route mechanical subagent dispatches to cheaper models

evidence 39 model-less dispatches on fable/opus matched mechanical routing classes in the last 2d; est $209.59 redirectable. Top classes: well-specified-impl ($95.27), bug-fix ($44.59), spec-review ($32.57).

apply axctl improve accept hook__17b5aaf6aade53e5

16.03skillPost-feature verification checklisthigh · 26/wkFeature closure needs stronger same-file follow-up verification.
11.93skillGraph query dogfood checklisthigh · 8/wkQuery builders can pass string tests while returning slow or low-signal output.
8.90skillSurrealDB schema change guardrailhigh · 3/wkSchema changes need a tighter migration/apply/query verification loop.

Accept is not the end - it's the experiment

recommend → accept → apply → lint → verdict
recommendscored, with evidenceaccept.ax/tasks/<id>.md briefagent applieslike any task filelintreconciles guidanceverdictconfirms or retires

Agents write back too - ax improve propose / ax improve analyze let a session file its own proposal mid-run; origin badges keep agent-derived and system-derived suggestions distinguishable.

#1
The top proposal above is the first showcase on this page. The graph mined "route mechanical dispatches to cheaper models" out of its own transcripts - $209.59 redirectable in two days - before it existed as a feature. We accepted the brief; it shipped as dispatch routing.the loop eating its own output · run ax improve recommend for yours
where the numbers come from
Scores blend frequency, severity, and the impact engine's backtested projected value - what the proposal would have saved or caught over your actual recent history, not a hypothetical.
runs on your machine
Mined from the local graph, applied to your own agent files. Nothing auto-edits: accept emits a brief, an agent does the work, ax improve lint checks it landed. The whole deck - proposals, impact, and past bets measured at +3/+10/+30 sessions - lives in the studio improve dashboard: ax serve.

who's thrashing · churn

Landed, edited, repaired - by source.

Lines of code is a vanity metric until you split it. ax sessions churn --here classifies 30 days of writes into landed vs edit vs repair LOC per provider, counts failed checks, and groups the failures into episodes - so "which sessions thrash" has a number.

Composition of added LOC · 30d

~/Projects/ax · claude / claude-subagent / codex
codexrepair <0.1%
claude-subagentrepair 2.6%
clauderepair 2.4%
landed · survived as written edit · reworked later repair · fixing a failed check

The repair sliver is the point - a tiny repair share means checks catch problems before they ship. The edit band is where the real rework hides: claude-subagent reworks a third of everything it writes.

sourcesessfailsepisodespasslandededitsrepair
codex574672322+330,730/-150,779+286/-142+58/-25
claude-subagent71952913+23,979/-4,157+13,508/-2,199+981/-274
claude1417173+117,641/-32,784+36,455/-9,172+3,867/-1,550
ax sessions churn --here · 30d window · LOC shown as +added/-removed

What an episode is

failure opens · same-family pass closes · 30min expiry
check failsepisode opens✗ ✗same-family failuresjoin the open episodesame-family passepisode closes30 min silenceepisode expires
467
codex failed 467 checks in 30 days - 8.2 per session - and still landed 330k LOC with under 0.1% repair share. The failures cluster into just 23 episodes: it thrashes in short windows against the test suite, then lands clean.claude-subagent is the opposite shape · 1.3 fails/session, 35% edit share
where the numbers come from
Every tool_call that runs a check (tests, typecheck, lint, build) is classified pass/fail by family. LOC written after a failure, touching the same files, counts as repair; later rework of landed lines counts as edit.
runs on your machine
Same local graph as everything else - scope with --here, a specific --project, or one --source. 30d window by default, --since=N to change it.

receipts, public · profiles

Publish what you actually ran.

ax profile publish turns your local graph into a public gist - counts, dates, trends, the skills and hooks you really lean on. No transcripts, no code, no paths. The nightly compile ranks everyone who opted in.

leaderboard/leaders
#usertokens
1@you1.8B
2@abuilder1.2B
3@cferreira940M

Boards rebuild nightly from registered gists. Trending skills filter out personal local:* skills - a skill only trends once 2+ builders publish it. See the live boards →

~/.axax-profile.json
{
  "v": 1,
  "github": "you",
  "window_days": 30,
  "stats": {
    "sessions": 412,
    "streak_days": 9,
    "tokens": { "total": 1.8e9 },
    "cost_usd": 605
  },
  "rig": {
    "skills": [
      { "name": "superpowers:tdd", "runs": 88 }
    ],
    "hooks": ["enforce-worktree"],
    "routing_table": true
  }
}

Aggregates only - the exact JSON is shown to you for consent before the first publish. Your profile page renders it live.

Hand the graph to an agent

ax mcp · stdio · 17 read-only tools
model context protocolax mcp

ax mcp runs a stdio MCP server exposing ax's read-only queries as 17 tools, so an agent can interrogate your graph in-context - recall a past session, pull weighted skills, read a proposal - mid-task. Mutating ops are deliberately not exposed.

  • recall
  • sessions_around
  • session_show
  • skills_weighted
  • skills_by_role
  • skills_roles
  • roles
  • improve_recommend
  • improve_show
  • improve_list
consent first, always
The first publish shows you the exact JSON and asks. State lives in ~/.ax/profile-publish.json; ax profile unpublish deletes the gist and resets it. Nothing leaves your machine until you say yes.
runs on your machine
The MCP server has no native deps and never mutates - it's the same query layer the CLI uses, handed to whatever agent you point at it. The graph stays local; only the answers cross the wire.