HomeProjectsHackathonsEventsWorkBlogTimeline

Codex Swarm

A rapid prototype showcasing how AI can enhance real-world developer workflows in minutes.

Codex Mumbai 2026

🎓 SudoMeet

📌 Mumbai

🚀 8 hours Hackathon Project

DAG
charts
dashboard
preview

Codex Swarm — DAG-Optimized Parallel Codex CLI Orchestrator

Executive Summary

Codex Swarm transforms a single natural-language product specification into a dependency-aware Directed Acyclic Graph (DAG) of coding tasks, then dispatches them to parallel Codex CLI agents running inside Docker-sandboxed containers. A real-time neobrutalist dashboard renders the entire execution pipeline live — from spec refinement through task completion.

The core innovation is a frontier-based DAG scheduler that maximizes Codex CLI throughput by distinguishing between blocking and non-blocking tasks, ensuring maximum parallelization while respecting true data dependencies between tasks.

Problem Statement

Today's Codex CLI workflow is fundamentally sequential and manual:

  1. A developer writes a prompt, fires a single Codex agent, and waits.
  2. Once it finishes, they manually write the next prompt for the next piece of work.
  3. Tasks that could run concurrently are serialized — wasting time and API capacity.
  4. There's no dependency tracking, no conflict avoidance, and no orchestration layer.

Codex Swarm eliminates this bottleneck entirely. What used to take an afternoon of copy-pasting prompts completes in minutes of autonomous, parallel execution.

Core Innovation: DAG-Based Task Scheduling

The Problem with Flat Task Lists

Most LLM orchestration tools treat tasks as a flat queue — execute one, then the next. Even "parallel" systems often run everything at once without understanding that Task C depends on Task A's output while Task B is completely independent.

Codex Swarm's Approach

Codex Swarm models task execution as a Directed Acyclic Graph where:

  • Nodes represent discrete, file-scoped coding tasks
  • Edges represent hard data dependencies (e.g., a service that imports a model must wait for the model to be created)
  • Independent branches in the graph execute simultaneously

This is not just a topological sort executed once. The orchestrator uses a frontier-based incremental scheduler that re-evaluates the ready set after every task completion:

                    ┌──────────┐
                    │ Spec     │
                    └────┬─────┘
                         │ LLM Decomposition
                    ┌────▼─────┐
              ┌─────┤ Task DAG ├─────┐
              │     └──────────┘     │
         ┌────▼───┐            ┌─────▼───┐
         │ Task A │            │ Task B  │  ← Independent: run in parallel
         │(models)│            │ (utils) │
         └────┬───┘            └─────┬───┘
              │                      │
         ┌────▼───┐            ┌─────▼───┐
         │ Task C │            │ Task D  │  ← Blocked until A/B complete
         │ (API)  │            │  (CLI)  │
         └────┬───┘            └─────────┘
              │
         ┌────▼───┐
         │ Task E │  ← Cascades only after C finishes
         │(tests) │
         └────────┘

Blocking vs. Non-Blocking: Maximizing Utilization

The DAG structure inherently classifies tasks:

TypeDefinitionScheduling Behavior
Non-blockingTasks with zero unmet dependenciesDispatched immediately, up to maxConcurrency slots
BlockingTasks with unmet dependenciesHeld in queued state until all prerequisites reach completed

The scheduler runs on every state transition — not on a timer. When Task A completes, the scheduler instantly evaluates which blocked tasks are now unblocked and fills available concurrency slots. This event-driven approach eliminates idle time between waves.

Key implementation detail (from orchestrator.ts):

// Frontier-based scheduling — runs after EVERY task state change
private scheduleReady(): void {
  const ready = [...this.tasks.values()].filter((t) => {
    const state = this.taskStates.get(t.id)!;
    if (state.status !== 'queued') return false;
    // A task is ready only when ALL its dependencies are completed
    return t.dependencies.every(
      (dep) => this.taskStates.get(dep)?.status === 'completed'
    );
  });

  const slots = this.maxConcurrency - this.runningCount;
  const toStart = ready.slice(0, slots);

  for (const task of toStart) {
    this.dispatch(task);
  }
}

This is called in the finally block of every task dispatch — meaning the moment a slot opens, it's filled. No polling delay, no batch boundaries.

Cascading Failure Propagation

When a task fails, the orchestrator doesn't leave its dependents in a permanent queued state. It proactively fails downstream tasks with a clear causal message:

private failDependents(parentId: string): void {
  for (const [id, task] of this.tasks) {
    if (task.dependencies.includes(parentId)) {
      const state = this.taskStates.get(id)!;
      if (state.status === 'queued') {
        state.status = 'failed';
        state.error = `Dependency "${parentId}" failed`;
        // ... emit update
        this.failDependents(id); // Recursive propagation
      }
    }
  }
}

This recursive propagation gives the user immediate feedback about the blast radius of a failure and enables targeted retries. Retrying a parent task automatically resets its failed dependents to queued.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                      Svelte 5 Dashboard                          │
│   DAG Visualization │ Agent Feed │ Timeline │ Diff Viewer        │
│                  (WebSocket — real-time bidirectional)           │
└────────────────────────────────┬─────────────────────────────────┘
                                 │
┌────────────────────────────────▼─────────────────────────────────┐
│                    Hono HTTP/WS Server                           │
│  ┌─────────────┐  ┌───────────────┐  ┌────────────────────┐      │
│  │  Spec       │  │  Decomposer   │  │   Orchestrator     │      │
│  │  Refiner    │  │  (DAG Gen)    │  │  (Frontier Sched)  │      │
│  │  (o3-mini)  │  │  (o3-mini)    │  │                    │      │
│  └──────┬──────┘  └──────┬────────┘  └────────┬───────────┘      │
│         │                │                     │                 │
│         ▼                ▼                     ▼                 │
│   Refined Spec  →  Task DAG JSON  →  Parallel Dispatch           │
└────────────────────────────────┬─────────────────────────────────┘
                                 │
              ┌──────────────────┼──────────────────┐
              │                  │                  │
     ┌────────▼────────┐ ┌──────▼───────┐ ┌────────▼────────┐
     │ Docker Container│ │Docker Contner│ │Docker Container │
     │  (4GB / 2 CPU)  │ │ (4GB / 2 CPU)│ │  (4GB / 2 CPU)  │
     │  Sandbox Copy   │ │ Sandbox Copy │ │  Sandbox Copy   │
     │  of Repo        │ │ of Repo      │ │  of Repo        │
     └─────────────────┘ └──────────────┘ └─────────────────┘

Component Breakdown

ComponentTechRole
Spec RefinerOpenAI Chat API (o3-mini)Converts rough user input into a precise engineering brief
DecomposerOpenAI Chat API (o3-mini, JSON mode)Generates file-scoped task DAG with explicit dependency edges
OrchestratorTypeScript, EventEmitterFrontier-based scheduler, concurrency control, lifecycle management
Agent PlannerOpenAI Chat API (o3-mini, JSON mode)Generates concrete write_file and shell action plans per task
Sandbox RunnerDocker, child_processIsolated container execution, diff capture, change application
DashboardSvelte 5 (Runes), Tailwind CSS, WebSocketReal-time DAG visualization, agent logs, diff viewer

Docker Sandboxing: Security Through Isolation

Each Codex agent runs inside a resource-constrained Docker container — not directly on the host. This is a critical security boundary: LLM-generated code is untrusted by default.

Sandbox Lifecycle

1. createSandboxCopy()  →  Full repo copy to /tmp path (no shared state)
2. docker run           →  Container with repo mounted at /workspace
                            --memory 4g --cpus 2 (resource caps)
3. docker exec          →  LLM-planned actions executed sequentially
4. getSandboxDiff()     →  git diff captured for review
5. applySandboxChanges()→  Only changed files copied back to original repo
6. docker kill + rm     →  Container destroyed, sandbox deleted

Why This Matters

ThreatMitigation
Malicious code executionRuns inside container with resource limits, not on host
File system escapeSandbox is a full copy — original repo untouched until explicit apply step
Resource exhaustion--memory 4g --cpus 2 hard caps per container
Cross-task interferenceEach task gets its own sandbox copy and container — complete isolation
Supply chain attacksContainer image is built from a controlled Dockerfile.agent with minimal tooling

The Docker image (Dockerfile.agent) is deliberately minimal: Node 22, git, curl, build-essential, and Python 3 — nothing more.

Diff-Before-Apply Pattern

Changes aren't blindly merged. The orchestrator:

  1. Captures a git diff --cached from the sandbox
  2. Streams the diff to the dashboard for visual review
  3. Copies only the files listed in git diff --cached --name-only back to the original repo

This means the user sees exactly what changed before it lands.

LLM-Driven Agent Architecture

Unlike systems that shell out to codex --full-auto and hope for the best, Codex Swarm uses a two-phase LLM architecture:

Phase 1: Server-Side Planning (OpenAI SDK)

The planTask() function (agent.ts) calls the LLM with:

  • The task description and target files
  • The current project file tree for context
  • A structured system prompt requesting a JSON action plan

The LLM responds with concrete actions:

{
  "actions": [
    { "type": "shell", "command": "mkdir -p /workspace/src/models" },
    { "type": "write_file", "path": "/workspace/src/models/user.ts", "content": "..." },
    { "type": "shell", "command": "cd /workspace && npm install zod" }
  ]
}

Phase 2: Docker-Side Execution

Each action is executed sequentially inside the container via docker exec. The separation ensures:

  • The LLM never has direct host access — it only produces a plan
  • The execution environment is sandboxed — the plan runs inside Docker
  • Partial failures don't abort — a failed npm install doesn't prevent subsequent file writes

Output from every action is streamed in real-time to the dashboard via WebSocket.

Real-Time Dashboard: Neobrutalist Design

The dashboard is built with Svelte 5 Runes and a custom neobrutalist design system — thick borders, hard shadows, no border radius, high-contrast accent colors, and bold typography.

Why Neobrutalism for a Developer Tool?

Neobrutalism solves a specific problem: information density without visual ambiguity. When monitoring 4+ parallel agents:

  • Zero border-radius → Hard geometric edges make panel boundaries unambiguous
  • 3px solid borders → Panels are visually distinct even at a glance
  • Status colors (cyan/green/red/yellow) → Instant task state recognition without reading labels
  • Offset box shadows → Depth hierarchy is immediately clear, no subtle gradients to decode
  • Monospace fonts for data → Diffs, logs, and output align perfectly

CSS Architecture

The design system (app.css) defines reusable neobrutalist primitives:

.nb-panel {
  border: 3px solid var(--border);
  box-shadow: 4px 4px 0px var(--border);
  background-color: var(--card);
}

.nb-shadow-hover:hover {
  box-shadow: 6px 6px 0px var(--border);
  transform: translate(-2px, -2px);
}

.nb-shadow-hover:active {
  box-shadow: 1px 1px 0px var(--border);
  transform: translate(3px, 3px);
}

Both light (#FFFBEB warm cream) and dark (#1a1a2e deep navy) themes are fully implemented with matching accent colors.

Dashboard Panels

PanelPurposeUpdate Frequency
DAG GraphSVG-rendered task nodes with dependency edges, color-coded by statusEvery task_update event
Agent FeedStreaming terminal output from each container, per-task filterableEvery agent_output chunk
TimelineGantt-style horizontal bars showing task start/end times and overlapEvery status transition
Diff ViewerSyntax-highlighted unified diff for each completed taskOn task completion
SummaryLive counters — running, queued, completed, failed, elapsed timeDerived reactively
Export PanelDownload full run data for post-mortem analysisOn demand

Reactive State Management

The store (swarm.svelte.ts) uses Svelte 5's $state and $derived runes for zero-boilerplate reactivity:

// All derived values recompute automatically on any task state change
get running() { return this.tasks.filter(t => t.status === 'running').length; }
get queued()  { return this.tasks.filter(t => t.status === 'queued').length; }

End-to-End Pipeline

User Input  →  Refine  →  Decompose  →  Schedule  →  Execute  →  Merge  →  Visualize
   │              │            │              │            │           │           │
   │ raw spec     │ o3-mini    │ o3-mini      │ frontier   │ Docker    │ file      │ WebSocket
   │              │ chat API   │ JSON mode    │ scheduler  │ sandbox   │ copy      │ push
   │              │            │              │            │           │           │
   ▼              ▼            ▼              ▼            ▼           ▼           ▼
 "Build a     Clear engg    Task DAG     Ready set     Isolated    Changed    DAG lights
  REST API    brief with    with edges   computed,     containers  files      up green
  with auth"  file paths    & deps       slots filled  per task    applied    in real-time

Detailed Flow

  1. Spec Refinement — Raw user text → o3-mini → structured engineering brief with file paths, tech stack, and data shapes.
  2. Repository Contextgit ls-files captures the current project structure, providing the decomposer with awareness of existing code.
  3. DAG Generation — Refined spec + file tree → o3-mini (JSON mode) → validated task graph. Dependencies are verified (all referenced IDs must exist).
  4. Frontier Scheduling — Tasks with zero unmet deps are dispatched immediately up to maxConcurrency. Each completion triggers a re-scan.
  5. Sandboxed Execution — Full repo copy → Docker container → LLM-planned actions → write_file and shell commands executed in sequence.
  6. Diff Capturegit diff --cached from sandbox, streamed to dashboard.
  7. Change Application — Only modified files are copied back to the original repo.
  8. Cascading Unlock — Completed task → dependents re-evaluated → next wave dispatched.
  9. Completion — When all tasks are completed or failed, swarm_complete is emitted with total time and success/failure counts.

WebSocket Protocol

All communication between server and dashboard uses a typed WebSocket protocol defined in types.ts:

DirectionMessage TypePayloadWhen
Client → Serverstart{ spec, repoPath, maxConcurrency }User clicks "Launch Swarm"
Client → ServercancelUser cancels the run
Client → Serverretry{ taskId }User retries a failed task
Server → Clientrefining{ spec }Spec refinement begins
Server → Clientrefined{ spec }Refined spec ready
Server → Clientdag{ tasks[] }DAG generated
Server → Clienttask_update{ taskId, status, error?, timestamps }Any task state change
Server → Clientagent_output{ taskId, chunk }Streaming agent output
Server → Clienttask_diff{ taskId, diff }Task diff captured
Server → Clientswarm_complete{ totalTime, succeeded, failed }All tasks finished

Configuration

All configuration is centralized in config.ts with environment variable validation:

VariableDefaultPurpose
OPENAI_API_KEYrequiredAPI key for OpenAI or compatible provider
OPENAI_BASE_URLapi.openai.com/v1Swap to Ollama, Groq, Together AI, etc.
DECOMPOSE_MODELo3-miniModel for DAG generation
REFINE_MODELo3-miniModel for spec refinement
AGENT_MODELo3-miniModel for agent action planning
MAX_CONCURRENCY4Max parallel Docker containers
AGENT_TIMEOUT_MS600000Per-task timeout (10 min)

The OPENAI_BASE_URL support means Codex Swarm works with any OpenAI-compatible API — local models via Ollama, cloud providers like Groq or Together AI, or Azure OpenAI deployments.

Tech Stack

LayerTechnologyRationale
RuntimeNode.js 22, TypeScript 5Native ESM, top-level await, modern APIs
HTTP/WS ServerHonoLightweight, edge-compatible, built-in WebSocket upgrade
FrontendSvelte 5 (Runes)Fine-grained reactivity without virtual DOM overhead
StylingTailwind CSS 4 + shadcn-svelteUtility-first with accessible component primitives
Monorepopnpm workspacesHoisted dependencies, fast installs
SandboxingDockerProcess isolation, resource limits, reproducible environments
LLMOpenAI Chat Completions APIJSON mode for structured output, streaming for real-time feedback

What Makes Codex Swarm Unique

CapabilityCodex SwarmTypical LLM Orchestrators
Task modelingDAG with typed dependency edgesFlat lists or simple chains
SchedulingEvent-driven frontier schedulerTimer-based polling or sequential
ParallelismTrue concurrent execution bounded by maxConcurrencyOften single-threaded or fire-all
IsolationDocker containers with resource caps per taskShared process, shared filesystem
Conflict avoidanceFile-scoped tasks + sandbox copiesHope for the best
Failure handlingRecursive cascading failure + targeted retryRetry-all or abandon
VisibilityReal-time DAG + streaming logs + live diffsPost-hoc log files
LLM integrationPlan-then-execute (LLM plans, Docker executes)LLM has direct system access
Provider flexibilityAny OpenAI-compatible API via OPENAI_BASE_URLLocked to single provider

Repository Structure

codex-swarm/
├── packages/
│   ├── server/                  # Orchestration backend
│   │   ├── src/
│   │   │   ├── index.ts         # Hono server, WebSocket routing
│   │   │   ├── orchestrator.ts  # DAG scheduler, task lifecycle
│   │   │   ├── decomposer.ts   # LLM-powered spec→DAG decomposition
│   │   │   ├── agent.ts        # LLM action planner (write_file/shell)
│   │   │   ├── sandbox-runner.ts# Docker container management
│   │   │   ├── config.ts       # Centralized env config
│   │   │   └── types.ts        # Shared type definitions
│   │   └── docker/
│   │       └── Dockerfile.agent # Minimal agent container image
│   └── web/                     # Dashboard frontend
│       └── src/
│           ├── app.css          # Neobrutalist design system
│           ├── lib/
│           │   ├── stores/
│           │   │   └── swarm.svelte.ts  # Reactive state (Svelte 5 Runes)
│           │   └── components/
│           │       ├── TaskGraph.svelte  # DAG visualization (SVG)
│           │       ├── AgentFeed.svelte  # Streaming agent output
│           │       ├── Timeline.svelte   # Gantt-style execution timeline
│           │       ├── DiffViewer.svelte # Syntax-highlighted diffs
│           │       ├── Summary.svelte    # Live progress counters
│           │       └── ExportPanel.svelte# Run data export
│           └── routes/
│               └── +page.svelte # Main dashboard page
└── docs/
    └── END_TO_END_WORKFLOW.md   # Step-by-step runtime walkthrough

License

MIT — see LICENSE.