Skip to content

Architecture

Agent Alchemy extends Claude Code into a structured development platform through three integrated pillars: a markdown-as-code plugin system, a real-time task dashboard, and a VS Code authoring extension. This page explains how these components are designed, how they interact, and the key patterns that hold the system together.

System Overview

At a high level, Agent Alchemy is a monorepo organized around three independent but complementary subsystems. The plugin system defines development workflows as markdown files that Claude Code executes directly. The task manager provides visibility into those workflows through a real-time Kanban board. The VS Code extension supports plugin authoring with schema validation and autocompletion.

graph TB
    subgraph Plugins["Plugin System (claude/)"]
        direction TB
        CT[core-tools]
        DT[dev-tools]
        ST[sdd-tools]
        TT[tdd-tools]
        GT[git-tools]
    end

    subgraph Apps["Task Dashboard (apps/)"]
        TM[Task Manager<br/>Next.js 16]
    end

    subgraph Extensions["Authoring (extensions/)"]
        VS[VS Code Extension<br/>Schema Validation]
    end

    CC[Claude Code CLI] -->|loads & executes| Plugins
    CC -->|reads & writes| FS["~/.claude/tasks/*.json"]
    FS -->|file watcher| TM
    VS -->|validates| Plugins

    style Plugins fill:#7c3aed,color:#fff
    style Apps fill:#06b6d4,color:#fff
    style Extensions fill:#059669,color:#fff
    style CC fill:#1e293b,color:#fff

Repository Structure

agent-alchemy/
├── claude/                        # Plugin system (markdown-as-code)
│   ├── .claude-plugin/            # Plugin marketplace registry
│   ├── core-tools/                # Codebase analysis, deep exploration
│   │   ├── skills/                # deep-analysis, codebase-analysis, ...
│   │   ├── agents/                # code-explorer, code-synthesizer, code-architect
│   │   └── hooks/                 # Lifecycle hooks (auto-approve)
│   ├── dev-tools/                 # Feature dev, code review, docs
│   │   ├── skills/                # feature-dev, docs-manager, ...
│   │   └── agents/                # code-reviewer, changelog-manager, ...
│   ├── sdd-tools/                 # Spec-Driven Development pipeline
│   │   ├── skills/                # create-spec, create-tasks, execute-tasks, ...
│   │   └── agents/                # researcher, spec-analyzer, task-executor
│   ├── tdd-tools/                 # Test-Driven Development workflows
│   │   ├── skills/                # tdd-cycle, generate-tests, analyze-coverage
│   │   └── agents/                # tdd-executor, test-writer, test-reviewer
│   ├── git-tools/                 # Git commit automation
│   │   └── skills/                # git-commit
│   └── plugin-tools/              # Plugin porting and ecosystem health
│       ├── skills/                # port-plugin, validate-adapter, ...
│       └── agents/                # researcher, port-converter
├── apps/
│   └── task-manager/              # Next.js 16 real-time Kanban dashboard
├── extensions/
│   └── vscode/                    # VS Code extension for plugin authoring
└── internal/                      # Internal documentation and analysis

Markdown-as-Code

Agent Alchemy's core innovation is encoding AI agent instructions, workflows, and team coordination logic in plain markdown files. Instead of writing code to orchestrate agents, you write structured markdown that Claude Code interprets and executes directly.

Why Markdown?

Traditional agent frameworks require you to write orchestration code in Python, TypeScript, or another programming language. Agent Alchemy takes a different approach: the markdown is the program. This means:

  • No runtime dependencies — plugins are just files, no build step required
  • Human-readable workflows — anyone can read a SKILL.md and understand what it does
  • Version-controlled prompts — skills live alongside code and evolve with git
  • Composable by design — skills load other skills through file reads, like imports

Skills and Agents

The plugin system is built on two primitives: skills and agents.

A skill is a workflow definition. The YAML frontmatter declares metadata, tool permissions, and invocation rules. The markdown body contains the step-by-step instructions Claude Code follows.

claude/git-tools/skills/git-commit/SKILL.md
---
name: git-commit
description: Commit staged changes with conventional commit message.
model: haiku
user-invocable: true
disable-model-invocation: false
allowed-tools: Bash, AskUserQuestion
---

# Git Commit

Create a commit with a conventional commit message...

## Workflow
### Step 1: Check Repository State
...

An agent is a specialized worker that a skill can spawn. The frontmatter declares the model tier, available tools, and skills the agent loads into its context.

claude/core-tools/agents/code-explorer.md
---
name: code-explorer
description: Explores codebases to find relevant files and map architecture
model: sonnet
tools:
  - Read
  - Glob
  - Grep
  - Bash
  - SendMessage
  - TaskUpdate
  - TaskGet
  - TaskList
skills:
  - project-conventions
  - language-patterns
---

# Code Explorer Agent

You are a code exploration specialist working as part of a
collaborative analysis team...

Progressive Knowledge Loading

Large knowledge bases are externalized into references/ subdirectories within each skill. Rather than loading everything upfront, skills load reference material on demand as specific phases require it. The project contains 30+ reference files across skills, keeping individual skill files focused while making detailed guidance available when needed.

claude/sdd-tools/skills/create-spec/
├── SKILL.md                              # Main workflow (664 lines)
└── references/
    ├── interview-questions.md            # Loaded during interview phase
    ├── recommendation-triggers.md        # Loaded when generating recommendations
    └── recommendation-format.md          # Loaded when formatting output

Plugin Inventory

The platform ships with 6 plugin groups, 28 skills, 16 agents, and 30+ reference files. See the Plugins documentation for the full catalog.


Plugin Composition Patterns

Skills do not compose through function calls or import statements. Instead, a skill loads another skill's markdown file at runtime, injecting its full instructions into the current context. This is composition through prompt injection.

Skill Loading via Prompt Injection

When feature-dev needs codebase exploration, it reads the deep-analysis SKILL.md and follows its workflow as if it were part of its own instructions:

claude/dev-tools/skills/feature-dev/SKILL.md (Phase 2)
## Phase 2: Codebase Exploration

1. **Run deep-analysis workflow:**
   - Read `${CLAUDE_PLUGIN_ROOT}/../core-tools/skills/deep-analysis/SKILL.md`
     and follow its workflow
   - Pass the feature description from Phase 1 as the analysis context

The ${CLAUDE_PLUGIN_ROOT} variable resolves to the current plugin's root directory at runtime. Cross-plugin references use the /../{source-dir-name}/ pattern to navigate between plugin groups.

Cross-Plugin Reference Convention

Always use ${CLAUDE_PLUGIN_ROOT}/../{source-dir-name}/ for cross-plugin references (e.g., /../core-tools/). Same-plugin references use ${CLAUDE_PLUGIN_ROOT}/ directly. Never use full marketplace names in path references.

Hub-and-Spoke Team Coordination

The deep-analysis skill implements a hub-and-spoke pattern for parallel codebase exploration. A lead agent (the skill executor) performs reconnaissance, composes a team plan, spawns N explorer agents and 1 synthesizer agent, then coordinates their work:

graph TD
    Lead["Lead Agent<br/>(Skill Executor)"]

    Lead -->|"spawns & assigns"| E1["Explorer 1<br/>(Sonnet)"]
    Lead -->|"spawns & assigns"| E2["Explorer 2<br/>(Sonnet)"]
    Lead -->|"spawns & assigns"| E3["Explorer 3<br/>(Sonnet)"]
    Lead -->|"spawns & assigns"| Syn["Synthesizer<br/>(Opus)"]

    E1 -->|"findings"| Syn
    E2 -->|"findings"| Syn
    E3 -->|"findings"| Syn
    Syn -.->|"follow-up questions"| E1
    Syn -.->|"follow-up questions"| E2
    Syn -.->|"follow-up questions"| E3
    Syn -->|"unified analysis"| Lead

    style Lead fill:#7c3aed,color:#fff
    style E1 fill:#06b6d4,color:#fff
    style E2 fill:#06b6d4,color:#fff
    style E3 fill:#06b6d4,color:#fff
    style Syn fill:#f59e0b,color:#000

Key characteristics:

  • Explorers work independently — no cross-worker messaging (hub-and-spoke topology)
  • Synthesizer can ask follow-ups — resolves conflicts and fills gaps by messaging specific explorers
  • Synthesizer has Bash access — can investigate git history, dependency trees, and run static analysis when file reads are insufficient
  • Task dependencies enforce order — the synthesis task is blocked by all exploration tasks

Phase Workflows with Completeness Enforcement

Complex skills use numbered phases with explicit enforcement directives to prevent Claude from stopping prematurely. This is a critical pattern because language models tend to treat intermediate outputs as final results.

Phase enforcement pattern
**CRITICAL: Complete ALL 7 phases.** The workflow is not complete until
Phase 7: Summary is finished. After completing each phase, immediately
proceed to the next phase without waiting for user prompts.

Skills using this pattern:

Skill Phases Purpose
feature-dev 7 Discovery through Exploration, Design, Implementation, Review, and Summary
deep-analysis 6 Session Setup through Recon, Approval, Assembly, Exploration, and Synthesis
tdd-cycle 7 Discovery through Analysis, Plan, RED, GREEN, REFACTOR, and Report
bug-killer 5 Triage through Investigation, Root Cause, Fix & Verify, and Wrap-up

Agent Tool Restrictions

Agents enforce separation of concerns through their tool permissions. Architect and reviewer agents are read-only — they can analyze code but cannot modify it. This ensures design and review phases cannot accidentally alter the codebase:

Agent Model Tools Access Level
code-explorer Sonnet Read, Glob, Grep, Bash, SendMessage Read-only
code-synthesizer Opus Read, Glob, Grep, Bash, SendMessage Read-only
code-architect (core-tools) Opus Read, Glob, Grep, SendMessage Read-only
code-reviewer (dev-tools) Opus Read, Glob, Grep, SendMessage Read-only
bug-investigator (dev-tools) Sonnet Read, Glob, Grep, Bash, SendMessage Read-only
task-executor Read, Write, Edit, Glob, Grep, Bash Full access
tdd-executor Opus Read, Write, Edit, Glob, Grep, Bash Full access

AskUserQuestion Enforcement

All interactive skills route user interaction through the AskUserQuestion tool rather than plain text output. This ensures structured, parseable responses and prevents skills from continuing without explicit user input when a decision point is reached.


Model Tiering Strategy

Agent Alchemy assigns Claude models to agents based on the cognitive demands of their task. This balances quality against cost and latency.

graph LR
    subgraph Opus["Opus — High Reasoning"]
        S1[Synthesis]
        S2[Architecture Design]
        S3[Code Review]
        S4[TDD Execution]
    end

    subgraph Sonnet["Sonnet — Parallel Workers"]
        W1[Code Exploration]
        W2[Test Writing]
        W3[Research]
    end

    subgraph Haiku["Haiku — Simple Tasks"]
        H1[Git Commits]
    end

    style Opus fill:#f59e0b,color:#000
    style Sonnet fill:#06b6d4,color:#fff
    style Haiku fill:#10b981,color:#fff
Tier Model Used For Rationale
Opus Most capable Synthesis, architecture, review, TDD execution Tasks requiring deep reasoning, cross-cutting analysis, and judgment calls
Sonnet Balanced Exploration, test writing, research Parallelizable tasks that benefit from broad search rather than deep reasoning
Haiku Fastest Git commits Simple, well-defined tasks where speed matters more than reasoning depth

Cost Optimization

The hub-and-spoke pattern in deep-analysis uses Sonnet for N parallel explorers (the expensive, parallelized part) and reserves a single Opus instance for the synthesizer. This keeps costs proportional to codebase complexity while maintaining synthesis quality.


Cross-Plugin Dependency Graph

The deep-analysis skill in core-tools is the keystone skill of the entire platform. Four skills across three plugin groups depend on it for codebase understanding:

graph TD
    subgraph core-tools
        DA[deep-analysis]
        CA[codebase-analysis]
    end

    subgraph dev-tools
        FD[feature-dev]
        DM[docs-manager]
    end

    subgraph sdd-tools
        CS[create-spec]
    end

    CA -->|wraps with reporting| DA
    FD -->|Phase 2: exploration| DA
    DM -->|codebase understanding| DA
    CS -.->|optional: new feature specs| DA

    FD -->|Phase 4| AR[code-architect x2-3<br/><i>core-tools</i>]
    FD -->|Phase 6| CR[code-reviewer x3]
    DA -->|spawns| EX[code-explorer x N]
    DA -->|spawns| SY[code-synthesizer x 1]

    style DA fill:#7c3aed,color:#fff
    style CA fill:#7c3aed,color:#fff
    style FD fill:#06b6d4,color:#fff
    style DM fill:#06b6d4,color:#fff
    style CS fill:#059669,color:#fff

Key Composition Chains

The full end-to-end workflows chain multiple skills and agents together:

feature-dev
  └─ deep-analysis (Phase 2)
       ├─ code-explorer (Sonnet) x N  — parallel exploration
       └─ code-synthesizer (Opus) x 1 — merge + investigate
  └─ code-architect (core-tools, Opus) x 2-3 — competing designs (Phase 4)
  └─ code-reviewer (Opus) x 3        — parallel review focuses (Phase 6)

create-spec
  └─ deep-analysis (optional)        — codebase context for new features
  └─ researcher agent                — technical research

create-tasks → reads spec → generates task JSON
execute-tasks → task-executor agent x N per wave

tdd-cycle → tdd-executor (Opus) x 1 per feature
  └─ 7-phase RED-GREEN-REFACTOR lifecycle

bug-killer (quick track)
  └─ read error location, targeted investigation
  └─ fix + regression test → project-learnings

bug-killer (deep track)
  └─ code-explorer (core-tools, Sonnet) x 2-3
  └─ bug-investigator (Sonnet) x 1-3
  └─ code-quality for fix validation
  └─ project-learnings

Real-Time Data Flow (Task Manager)

The task manager provides a real-time Kanban board that visualizes task files written by Claude Code during workflow execution. It uses a file-system-first architecture — no database, no message queue. The file system is the data store.

sequenceDiagram
    participant CC as Claude Code
    participant FS as File System<br/>(~/.claude/tasks/)
    participant CK as Chokidar<br/>(File Watcher)
    participant SSE as SSE Endpoint<br/>(Route Handler)
    participant TQ as TanStack Query<br/>(Client Cache)
    participant UI as React UI<br/>(Kanban Board)

    CC->>FS: Write/update task JSON
    CK->>FS: Detect change (300ms polling)
    CK->>SSE: Emit taskEvent
    SSE->>TQ: Push SSE event
    TQ->>TQ: Invalidate query cache
    TQ->>UI: Trigger re-render
    UI->>UI: Update board columns

Pipeline Components

File Watcher (Server-Side)

The FileWatcher class uses Chokidar to monitor ~/.claude/tasks/ with 300ms polling. It emits typed events (task:created, task:updated, task:deleted) when JSON files change:

apps/task-manager/src/lib/fileWatcher.ts
// Global singleton pattern for development hot reload
// Prevents multiple file watchers during Next.js HMR
const globalForWatcher = globalThis as unknown as {
  fileWatcher: FileWatcher | undefined
}

export const fileWatcher = globalForWatcher.fileWatcher ?? new FileWatcher()

if (process.env.NODE_ENV !== 'production') {
  globalForWatcher.fileWatcher = fileWatcher
}

GlobalThis Singleton

The globalThis pattern is essential for Next.js development. Without it, every hot module replacement cycle would create a new FileWatcher, leaking file handles and producing duplicate events.

SSE Bridge (Server to Client)

A Next.js Route Handler at /api/events converts FileWatcher events into Server-Sent Events. Each connected client receives a persistent stream scoped to a specific task list.

Cache Invalidation (Client-Side)

The useSSE hook listens for SSE events and invalidates the relevant TanStack Query cache entries, triggering React to re-fetch and re-render:

apps/task-manager/src/hooks/useSSE.ts
const handleTaskEvent = () => {
  queryClient.invalidateQueries({ queryKey: taskKeys.list(taskListId) })
  queryClient.invalidateQueries({ queryKey: taskListKeys.all })
  router.refresh()
}

eventSource.addEventListener('task:created', handleTaskEvent)
eventSource.addEventListener('task:updated', handleTaskEvent)
eventSource.addEventListener('task:deleted', handleTaskEvent)

Lifecycle Hooks

Plugins can register lifecycle hooks that run before or after Claude Code tool invocations. Hooks are defined in hooks/hooks.json within a plugin group and execute shell commands with a timeout.

claude/core-tools/hooks/hooks.json
{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Write|Edit|Bash",
        "hooks": [
          {
            "type": "command",
            "command": "bash ${CLAUDE_PLUGIN_ROOT}/hooks/auto-approve-da-session.sh",
            "timeout": 5
          }
        ]
      }
    ]
  }
}

In this example, the core-tools plugin auto-approves file operations targeting deep-analysis session directories (~/.claude/sessions/), so checkpointing and cache writes do not require manual user confirmation.


Technology Stack

Plugin System

Component Technology Purpose
Skill definitions YAML frontmatter + Markdown Workflow instructions and metadata
Agent definitions YAML frontmatter + Markdown Worker specialization and tool permissions
Reference files Markdown Progressive knowledge loading
Lifecycle hooks JSON config + shell scripts Pre/post tool-use automation
Runtime Claude Code CLI Skill execution, agent spawning, team coordination

Task Manager

Component Technology Purpose
Framework Next.js 16 App Router, Server Components, Route Handlers
UI Library React 19 Component rendering
State TanStack Query 5 Server state caching and invalidation
Styling Tailwind CSS 4 Utility-first CSS
Components shadcn/ui (Radix) Accessible UI primitives
File watching Chokidar 5 File system change detection
Theming next-themes SSR-safe dark/light mode

VS Code Extension

Component Technology Purpose
Validation engine Ajv JSON Schema validation for YAML frontmatter
Schema format JSON Schema 7 schemas for plugin file types
Build tool esbuild Fast extension bundling
Activation workspaceContains Auto-activates in plugin workspaces

Validated File Types

The VS Code extension validates seven file types: skill frontmatter, agent frontmatter, plugin.json, hooks.json, .mcp.json, .lsp.json, and marketplace.json. See VS Code Extension for details.


Design Decisions

Markdown over code for agent orchestration

Agent instructions are inherently natural language. Encoding them in markdown eliminates the impedance mismatch between "what you want the agent to do" and "how you express it." Skills are readable by humans and executable by Claude Code without a compilation step.

File system as the integration layer

The task manager reads JSON files that Claude Code writes to disk. This avoids coupling the dashboard to Claude Code's internals. Any process that writes correctly-shaped JSON to ~/.claude/tasks/ becomes visible on the board, and the task manager never writes back — it is purely observational.

Separation of analysis and execution

Read-only agents (explorers, architects, reviewers) cannot modify the codebase. Write-capable agents (executors) cannot make architectural decisions. This separation enforces a review-then-act workflow and prevents accidental changes during analysis phases.

Model tiering for cost control

Running Opus for every agent would be prohibitively expensive at scale. By reserving Opus for synthesis and judgment tasks while using Sonnet for parallelizable exploration, the system keeps costs proportional to the depth of reasoning required rather than the breadth of search.