metaswarm

A self-improving multi-agent orchestration framework for Claude Code. 18 specialized agents coordinate through the full development lifecycle, from GitHub issue to merged PR, with TDD, spec-driven development, and 100% test coverage.

cd your-project
npx metaswarm init

The Problem

Claude Code is good at writing code. It is not good at building and maintaining a production codebase.

Shipping a production codebase needs more than just code. It needs research into what already exists, a plan that fits the codebase, a security review, a design review, tests, a PR, CI monitoring, review comment handling, and someone to close the loop and capture what was learned. That is seven or eight distinct jobs. A single agent session cannot hold all of that context, and it definitely cannot review its own work objectively.

So you end up doing the coordination yourself. You are the orchestrator. You prime the agent with context, tell it what to build, review the output, fix what it missed, create the PR, babysit CI, respond to review comments, and then do it all again for the next feature. The agent is a fast typist, but you are still the project manager.

metaswarm fixes that. It is a full orchestration layer for Claude Code that breaks the work into phases, assigns each phase to a specialist agent, iterates through multiple reviews from other agents blocking until they approve, and coordinates the handoffs, all the way through PR creation and shepherding through external code agent review, integrating with tools like CodeRabbit, Greptile, and other external code review agents. You describe what you want built. The system figures out how to build it, reviews its own plan, implements it with TDD, shepherds the PR through CI and review, and writes down what it learned for next time.

The Pipeline

Every feature goes through eight phases. Each phase is handled by a specialist agent (or a group of them). The Issue Orchestrator manages the handoffs.

Research Researcher agent explores codebase, finds patterns and dependencies

Plan Architect agent creates implementation plan with tasks

Design Review Gate PM, Architect, Designer, Security, CTO review in parallel 5 agents parallel

Implement Coder agent builds with TDD (red-green-refactor)

Code Review + Security Audit Two reviewers check the implementation independently

PR Creation Creates PR with structured description and test plan

PR Shepherd Monitors CI, handles review comments, resolves threads

Close + Learn Extracts learnings back into the knowledge base

The Design Review Gate is the part that surprised me. Five agents review the plan simultaneously, each from a different perspective. All five have to approve before implementation starts. If they do not agree after three rounds, the system escalates to a human. This catches real problems. Not theoretical ones.

It Gets Smarter Over Time

metaswarm maintains a JSONL knowledge base in your repo. Patterns, gotchas, architectural decisions, anti-patterns. After every merged PR, the self-reflect workflow analyzes what happened and writes new entries.

But the interesting part is conversation introspection. The system looks at your Claude Code session and watches for signals:

You repeated yourself. If you corrected the same behavior twice, that is a candidate for a new skill or command. The system flags it.
You disagreed. When you override Claude's recommendation, the system captures your preferred approach so agents align with your intent in future sessions.
You did something manually that should be automated. Repeated manual steps get flagged as workflow candidates.

The knowledge base can grow to hundreds or thousands of entries without filling your context window, because agents do not load all of it. bd prime uses selective retrieval, filtered by the files you are touching, the keywords that matter, and the type of work you are doing. You get the five gotchas relevant to the auth middleware you are about to change, not the entire institutional memory of the project.

Components

18 Agent Personas

Researcher, Architect, PM, Designer, Security, CTO, Coder, Code Reviewer, Security Auditor, PR Shepherd, Test Automator, Knowledge Curator, and more. Each has a defined role, process, and output format.

5 Orchestration Skills

Design review gate, PR shepherd, PR comment handling, brainstorming extension, and issue creation. These are the coordination behaviors that tie agents together.

7 Slash Commands

/project:prime, /project:start-task, /project:review-design, /project:self-reflect, /project:pr-shepherd, and more. These are your entry points.

5 Quality Rubrics

Standardized review criteria for code, architecture, security, test coverage, and implementation plans. These are what the review agents score against.

Coverage Enforcement

Configurable test coverage thresholds via .coverage-thresholds.json that block PR creation and task completion. Agents cannot ship code that drops coverage. Works with any test runner.

Knowledge Base Templates

Schema and example entries for patterns, gotchas, decisions, anti-patterns, codebase facts, and API behaviors. Seed it with your project's context.

Recursive Orchestration

Swarm Coordinators spawn Issue Orchestrators, which can spawn sub-orchestrators. Complex epics decompose into sub-epics automatically. Swarm of swarms.

The Agents

Each agent is a markdown file that defines a persona, responsibilities, process, and output format. They are prompts, not code. You can read them, edit them, and add your own.

Agent	Phase	What It Does
Swarm Coordinator	Meta	Assigns work to worktrees, manages parallel execution
Issue Orchestrator	Meta	Decomposes issues into tasks, manages phase handoffs
Researcher	Research	Explores codebase, discovers patterns and dependencies
Architect	Planning	Designs implementation plan and service structure
Product Manager	Review	Validates use cases, scope, and user benefit
Designer	Review	Reviews API/UX design and consistency
Security Design	Review	Threat modeling, STRIDE analysis, auth review
CTO	Review	TDD readiness, codebase alignment, final approval
Coder	Implement	TDD implementation with 100% coverage
Code Reviewer	Review	Pattern enforcement, test verification
Security Auditor	Review	Vulnerability scanning, OWASP checks
PR Shepherd	Delivery	CI monitoring, comment handling, thread resolution
Knowledge Curator	Learning	Extracts learnings, updates knowledge base
Test Automator	Implement	Test generation and coverage enforcement
Metrics	Support	Analytics and weekly reports
SRE	Support	Infrastructure and performance
Slack Coordinator	Support	Notifications and human communication
Customer Service	Support	User support and triage

Agents Skip Checklists. Gates Don't.

The hardest problem in agent-driven development is not getting agents to write code. It is getting them to maintain standards. You can put "run coverage before pushing" in a checklist. Agents will skip it. They will misread thresholds, run the wrong command, or decide the step does not apply. We shipped multiple PRs with coverage regressions before we accepted that procedural enforcement is not enforcement. It is a suggestion.

The fix is deterministic gates: automated checks that block bad code regardless of whether an agent follows instructions. metaswarm supports three enforcement points, all driven by a single config file:

Pre-Push Hook

A Husky git hook that runs lint, typecheck, format checks, and your coverage command before every git push. If coverage drops, the push is rejected. No agent can bypass it.

CI Coverage Job

A GitHub Actions workflow that reads the same config and blocks merge on failure. Even if an agent somehow pushes, it cannot merge.

Agent Completion Gate

The task-completion checklist reads the enforcement command from config. The weakest gate on its own, but combined with the other two, coverage regressions are caught at every level.

One Config File

.coverage-thresholds.json defines your thresholds and enforcement command. All three gates read from it. Change your test runner once, all gates update automatically.

Setting it up is one command:

npx metaswarm init --with-husky --with-ci

This initializes Husky, installs the pre-push hook, creates the CI workflow, and copies the coverage thresholds config to your project root. Each flag is opt-in: use --with-coverage alone for just the config file, --with-husky for the git hook, or --with-ci for the GitHub Actions workflow. Use all three for the full enforcement stack.

The thresholds work with any test runner. Set enforcement.command to pnpm test:coverage, pytest --cov, cargo tarpaulin, or whatever your project uses. See coverage-enforcement.md for the full setup guide.

Set It Up

Prerequisites

Claude Code
BEADS CLI (bd) v0.40+
Superpowers plugin (optional but recommended)
GitHub CLI (gh)
Node.js 20+

Install

cd your-project
npx metaswarm init
# Or with coverage enforcement gates:
npx metaswarm init --with-husky --with-ci

That scaffolds all 18 agents, skills, commands, rubrics, knowledge templates, and scripts into your project. The flags optionally set up pre-push hooks and CI coverage enforcement. Existing files are never overwritten.

Then Customize

Give this prompt to Claude Code in your project. It will adapt metaswarm to your language, framework, and conventions:

Copy this into Claude Code

Clone https://github.com/dsifry/metaswarm into /tmp/metaswarm-install
and set up the multi-agent orchestration framework in this project:

1. Copy agents, skills, commands, rubrics, and knowledge templates
   into the right .claude/ directories (see metaswarm's INSTALL.md
   for the exact paths).

2. Create the plugin.json registration file.

3. Initialize BEADS with bd init and set up the knowledge directory.

4. Read this project's config files (package.json, Cargo.toml,
   pyproject.toml, go.mod, or whatever exists) to understand our
   language, framework, test runner, and linter.

5. Customize the agent definitions and rubrics for our specific
   stack. Replace generic test/lint/build commands with ours.
   Add our framework's patterns to the architecture rubric.

6. Seed the knowledge base with 3-5 initial patterns, 2-3
   architectural decisions, and 1-2 gotchas from this codebase.

7. Clean up the temp clone when done.

Do not change the orchestration workflow itself. Only adapt the
language-specific and project-specific details.

The Manual Way

# Install BEADS
curl -sSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/install.sh | bash

# Clone and copy
git clone https://github.com/dsifry/metaswarm.git /tmp/metaswarm-install
mkdir -p .claude/plugins/metaswarm/skills/beads/agents
cp /tmp/metaswarm-install/agents/* .claude/plugins/metaswarm/skills/beads/agents/
cp /tmp/metaswarm-install/ORCHESTRATION.md .claude/plugins/metaswarm/skills/beads/SKILL.md
cp -r /tmp/metaswarm-install/skills/* .claude/plugins/metaswarm/skills/
cp -r /tmp/metaswarm-install/commands/* .claude/commands/
cp -r /tmp/metaswarm-install/rubrics/* .claude/rubrics/

# Initialize BEADS and knowledge base
bd init
mkdir -p .beads/knowledge
cp /tmp/metaswarm-install/knowledge/* .beads/knowledge/

# Clean up
rm -rf /tmp/metaswarm-install

See INSTALL.md for the full guide, including customization checklists for TypeScript, Python, Rust, and Go projects.

How It Actually Works

Under the hood, this is all prompts and BEADS task tracking. No custom runtime. No server. No dependencies beyond Claude Code and the bd CLI.

Agent definitions are markdown files

Each agent in agents/ is a prompt that defines a role, responsibilities, and process. When the orchestrator needs a researcher, it spawns a subagent with that prompt. The agent does its work, returns results, and the orchestrator moves to the next phase. You can read every agent definition. You can edit them. You can add new ones.

BEADS tracks the work

Every feature starts as a BEADS epic with subtasks. Dependencies between tasks enforce ordering. The orchestrator checks bd ready to find unblocked work, updates task status as agents complete phases, and closes the epic when the PR merges. All of this is stored in a SQLite database inside your repo, synced through git.

Knowledge base is selective, not exhaustive

When an agent starts work, bd prime loads knowledge filtered by the files being touched and the type of work being done. A coder working on auth routes gets auth-related gotchas. A security reviewer gets the OWASP-related patterns. The knowledge base can grow to thousands of entries without any agent needing to read all of them.

Human escalation is built in

The design review gate gives agents three iterations to converge. If they cannot agree, or if requirements are ambiguous, the system stops and asks a human. Tasks get marked as blocked in BEADS with a waiting:human label, and if you have Slack configured, you get a DM. The system does not guess when it should ask.

Works With Your Code Reviewers

metaswarm does not replace your automated code review tools. It works with them. The PR Shepherd agent monitors incoming review comments from whatever tools you have configured and handles them systematically.

Supported review tools

Out of the box, the PR comment handling skill knows how to parse and respond to:

CodeRabbit (the one we use most heavily)
Claude Code Review (Anthropic's native review)
Cursor BugBot
Greptile
Any GitHub-native review comment (human or bot)

The handling workflow categorizes each comment by priority, determines if it is actionable or out-of-scope, addresses the actionable ones, and resolves the threads. Comments from automated reviewers also feed into the self-reflect loop. When CodeRabbit catches something three times, that becomes a knowledge base entry so agents stop making that mistake.

The GTG Merge Gate

The last piece of the PR lifecycle is knowing when a PR is actually ready to merge. That is what GTG (Good-To-Go) does. It is a single CLI and GitHub Action that consolidates everything into one deterministic check:

All CI checks passing
All review comments addressed
All discussion threads resolved
Required approvals present

# PR Shepherd polls this until it returns READY
gtg 42 --format json --exclude-checks "Merge Ready (gtg)"

The PR Shepherd agent uses GTG as its primary readiness signal. When GTG reports CI_FAILING, the shepherd investigates and fixes. When it reports ACTION_REQUIRED, it addresses review comments. When it reports UNRESOLVED_THREADS, it resolves them. When it returns READY, it notifies a human for final merge approval.

You set this up as a GitHub Action in your repo. The templates/ directory includes the workflow file. Combined with your repo's branch protection rules, this gives you a fully automated quality gate that agents cannot bypass.

Built On

BEADS by Steve Yegge. Git-native, AI-first issue tracking. The coordination backbone for all task management, dependency tracking, and knowledge priming. BEADS made it possible to treat issue tracking as part of the codebase instead of an external service.

Superpowers by Jesse Vincent and contributors. The agentic skills framework that provides foundational workflows for brainstorming, test-driven development, systematic debugging, and plan writing. Superpowers proved that disciplined agent workflows are not overhead. They are what make autonomous development reliable.

Get Started

The repo has everything: agent definitions, skills, commands, rubrics, knowledge templates, and full documentation.

View on GitHub