How we got here
This release started as a parity bug. The worktree guards - the hooks that block branch switches and main-branch writes so agents work in linked worktrees - existed only as bash scripts wired into Claude Code's `settings.json`. Codex sessions had no guards at all: different config file, different input contract, zero shared tooling (#252).
The session that was supposed to port the bash scripts pivoted mid-plan: ax already owns a cross-harness hook config layer (`ax hooks config` reads and writes claude/codex/cursor/opencode configs) and a graph full of historical `tool_call` rows. So instead of maintaining two bash dialects, hooks became something ax can own end to end - author once in TypeScript, prove against history, install everywhere. The design and 11-task plan are in the repo (`docs/superpowers/specs/2026-06-10-hooks-sdk-design.md`).
The dogfood loop closed itself during development: while editing Codex's config, the freshly installed SDK write-guard fired inside the very session building it and blocked the edit - the dotfiles repo was on `main`. Both guards then blocked live probes in Codex (`hook: PreToolUse Blocked` on a branch switch and on a file edit), and the first real backtest replayed 10,992 tool calls from the graph with a 0.8% would-block rate before the bash originals were retired.
What changed
`@ax/hooks-sdk` (#252). A hook is one file in `~/.ax/hooks/`:
import { defineHook, Verdict, GitEnv } from "@ax/hooks-sdk";
import { Effect } from "effect";
export default defineHook({
name: "enforce-worktree",
events: ["PreToolUse"],
matcher: { tools: ["Bash"] },
run: (event) =>
Effect.gen(function* () {
const git = yield* GitEnv;
if (switchesBranch(event) &&
(yield* git.isPrimaryTree(event.cwd)))
return Verdict.block("use a worktree instead");
return Verdict.allow;
}),
});The SDK normalizes both harness input shapes (Claude stdin JSON, Codex `turn_id` payloads, legacy env vars) into one typed `HookEvent`, and encodes verdicts per harness (exit 2 + stderr to block, `systemMessage` to warn). A hook that throws fails open - a buggy guard never wedges the agent. `GitEnv` ships live and canned test layers, so guards unit-test without a repo.
Tokenized git-command parsing. The bash guards matched substrings, so `echo git merge` could false-positive and plain `git merge` into a dirty primary tree was silently missed (a latent bug the port surfaced). The SDK parses commands into per-invocation tokens - `git -C <path>` targets resolve correctly, compound commands are checked per segment.
Three new subcommands. `ax hooks init` scaffolds the workspace, `ax hooks install <file> --providers=claude,codex` fans a hook into both configs idempotently (Codex writes prefer `hooks.json` when present), and `ax hooks backtest <file> --days=14` replays your real history through the hook in-process:
backtest: enforce-worktree (last 7d, 1 provider)
replayed 10,992 tool calls
would-block 90 (0.8%)
top projects:
-Users-necmttn-Projects-ax 8,871 calls 62 blocked
caveat: state-dependent checks (branch, dirty) used CURRENT repo state.The old feedback-case runner kept its job under a new name: `ax hooks cases` scores known candidates against labeled outcomes (precision/recall); `backtest` replays any hook file. Accepted hook proposals from `ax improve` now point at the SDK path as the preferred implementation.
CLI split into command-family modules (#240). Phase 2 of the architecture-deepening track: the monolithic `cli/index.ts` broke apart into per-family modules under `cli/commands/`. The master plan and ADR 0012 (parsers as `NormalizedTranscriptBatch` adapters) landed alongside (#255, #250).
SurrealDB 3.0.x ingest fix (#253). SurrealDB 3.0.x aborts `SELECT ... FROM [recordid]` record-list selects with "Specify a database to use"; ingest now materializes via `.map()` instead. Bucket paths are rendered per machine, so transcript pointers survive across hosts.
Share cost accumulation fix (#237). Shared session viewers bucket turn usage onto kept turns, so cost-so-far accumulates correctly when turns are filtered out of a share.
Why it matters
Guardrails for coding agents were write-twice, test-never: a bash script per harness, validated only by waiting for it to fire in production. Now a guard is one typed file with unit tests, a replay of weeks of your own real tool calls before you trust it, and one install command for every harness you run. The data to do this was already in the graph - this release turns it into a test bed.