● mongoose · works

mongoose

A terminal-first developer agent harness — an OpenCode-style TUI — built natively around two ideas: knowledge is first-class, and work produces evidence.

mongoose is the works leg of the platform. Two things are first-class that general-purpose harnesses bolt on later: a knowledge base (meerkat is wired in by default as an MCP server) and evidence (every meaningful step emits a record bound to the git state it ran against). It runs on the developer's machine and feeds zegit — it is not a hosted service or CI system.

Install & run

Build from source and export a provider key; running the bare binary on a real terminal launches the interactive TUI.

terminalbash

export ANTHROPIC_API_KEY=sk-ant-...
go build -o mongoose ./cmd/mongoose

./mongoose run "list all Go files in the repo"   # one-shot
./mongoose                                        # launch the TUI (alias: mg)

Multi-provider

Anthropic, Gemini, OpenAI, and Ollama are supported; mongoose auto-detects whichever providers have keys at startup. The default model is anthropic/claude-sonnet-4-6.

Commands

Command	Description
`mongoose run [prompt]`	Run the agent loop with a prompt. Streams output; writes one evidence record.
`mongoose contract new`	An adversarial interview that writes a task contract to `.mongoose/contract.yaml`.
`mongoose checkpoint`	Inspect / rewind file checkpoints captured before edits (`list`, `restore`).
`mongoose version`	Print the version.

run flags

Flag	Effect
`--model`	Override the model from config.
`--workdir`	Working-directory root for tools.
`--no-mcp`	Disable MCP server connections (skip meerkat).
`--no-evidence`	Disable evidence recording.
`--review`	After the run, an advisory reviewer evaluates the diff against the contract.
`--think`	Enable extended thinking on capable models.
`--budget`	Stop the run once cumulative tokens reach a cap.
`--output-format`	`text` (streamed) or `json` (a single result object).

The TUI

Keybindings include / for slash commands & skills, @ to insert a file path, ctrl+o to switch model, ctrl+p to switch session, and ? for help. Slash commands cover the workflow: /model, /plan / /act, /auto, /contract, /review, /evidence, /commit, /compact, and more.

Model routing & catalogs

Models are provider-qualified (<provider>/<model>). Three tiers — frontier, balanced (the default), and fast — back an auto mode: selecting <provider>/auto routes each turn to the cheapest sufficient tier via a tiny Tier-3 classifier call. On classifier error it falls back to the balanced tier. Auto stays within a single provider.

Each provider hardcodes a small, curated, tiered catalog (one model per tier). The rule is "list, then functionally verify" — a smoke test does a real function-calling round-trip so a retired-upstream model is caught.

Skills

Two kinds share one loader. Prompt skills (Anthropic Agent Skills format — a SKILL.md) inject guidance into the conversation. Action skills (a skill.yaml) run declarative, guarded steps: as a unit, gated once by a risk: level. Both are model-callable (as skill_<name>) and user-invocable as /slash commands. Risk maps straight onto the policy vocabulary: low → ALLOW, review → REQUIRE_REVIEW, block → BLOCK.

Evidence output

mongoose emits one unsigned evidence record per run, accumulating a step per meaningful action, bound to git state (commit_sha + tree_sha, or a dirty flag outside a repo). The record's decision is set from the contract's risk appetite and escalated — never downgraded — if a --review pass fails. The path is printed to stderr after the run.

terminalbash

./mongoose contract new "add rate limiting to the public API"
./mongoose run --review "implement the rate limiter per the contract"
# evidence: .zegit/evidence/<run_id>.jsonl   →  hand to zg

The handoff

That same record is what zg validate / zg evidence evaluate consumes and signs into an AoV. The upgrade from log to proof is just installing zegit. See Core concepts.

Authoritative reference, generated from the component repos. Spot something stale? Tell us.

mongoose

Install & run#

Commands#

run flags#

The TUI#

Model routing & catalogs#

Skills#

Evidence output#