Shows how much of the article you have read

Procler: Designing a CLI for AI Agents

Procler: Designing a CLI for AI Agents

Most CLI tools are designed for humans. Pretty tables, colored output, interactive prompts. An AI agent hits procler status and gets back a formatted table it has to parse with regex. That’s backwards.

Procler started from a different question: what if the primary user is an LLM agent, and humans are the secondary audience?

This post walks through the design decisions that followed from that premise. Not benchmarks — there’s nothing to measure yet. Just the reasoning behind the API shape, and where it worked out differently than expected.


Decision 1: Every Command Returns JSON

The most obvious decision, but the consequences go deeper than json.dumps().

$ procler status api
{
  "success": true,
  "data": {
    "name": "api",
    "status": "running",
    "pid": 48291,
    "uptime_seconds": 3421,
    "linux_state": {
      "state_code": "S",
      "state_name": "sleeping",
      "is_killable": true
    }
  }
}

The shape is always {success, data?, error?, error_code?, suggestion?}. Always. An agent doesn’t need conditional parsing logic — it checks success, reads data or error, done.

What this forced: Every error path had to produce structured output too. No print("something went wrong") followed by sys.exit(1). Every failure includes an error_code (machine-readable) and a suggestion (for the agent to decide what to do next):

{
  "success": false,
  "error": "Process 'api' is already running",
  "error_code": "ALREADY_RUNNING",
  "suggestion": "Use 'procler restart api' to restart, or 'procler status api' to check current state"
}

The suggestion field is the interesting one. It’s not for humans — a human can read the error and figure it out. It’s for an LLM agent that needs to decide its next action without understanding the full mental model of process management.


Decision 2: Self-Describing Commands

An LLM agent connecting to a new system needs to discover what it can do. Most CLI tools require reading docs or --help output. Procler has a single entry point:

$ procler capabilities
{
  "success": true,
  "data": {
    "commands": [
      {
        "name": "start",
        "args": [{"name": "name", "type": "string", "required": true}],
        "description": "Start a defined process",
        "idempotent": true
      },
      ...
    ]
  }
}

Every command declares its arguments, types, whether it’s idempotent, and what it does. An agent can call capabilities once and build a complete mental model of the system.

procler config explain goes further — it reads the current config and describes it in plain language:

$ procler config explain
{
  "success": true,
  "data": {
    "explanation": "You have 3 processes defined: 'api' runs uvicorn locally, 'worker' runs celery locally and depends on 'api' being healthy, 'redis' runs in Docker container 'my-redis'. The 'backend' group starts them in order: redis → api → worker.",
    "process_count": 3,
    "group_count": 1,
    "recipe_count": 1
  }
}

What we learned: The explain command turned out to be useful for humans too. Reading a 50-line YAML and mentally parsing dependencies is harder than reading a paragraph. This was supposed to be an LLM convenience; it became the fastest way for anyone to understand a project’s process setup.


Decision 3: Idempotent Everything

AI agents retry. Network calls fail. Context windows get truncated mid-operation. If procler start api crashes the system when api is already running, the agent needs defensive logic around every call.

Instead: every operation is idempotent. start on a running process returns the current state. stop on a stopped process returns success. define with the same name and command is a no-op.

$ procler start api    # starts the process
$ procler start api    # returns current state, no error
{
  "success": true,
  "data": {
    "name": "api",
    "status": "running",
    "pid": 48291,
    "already_running": true
  }
}

The already_running flag lets the agent know it didn’t cause the start, but the operation succeeded. No error handling needed.

Where this broke down: Recipes aren’t idempotent. A deploy recipe that stops → migrates → starts can’t be safely re-run mid-migration. We added --dry-run as a safety valve, but the fundamental problem remains: multi-step operations with side effects resist idempotency. The honest answer is that recipes need a proper state machine with checkpointing, and we don’t have that yet.


Decision 4: Context Abstraction

A process manager that only handles local shells is just a wrapper around subprocess. A process manager that only handles Docker is just a wrapper around docker-py. The interesting problem is managing both with the same interface.

processes:
  api:
    command: uvicorn main:app
    context: local          # subprocess

  postgres:
    command: postgres
    context: docker         # Docker SDK
    container: my-postgres

procler start api and procler start postgres use different execution backends but return identical JSON shapes. The agent doesn’t need to know (or care) which context a process uses.

The abstraction is a Python ABC — ExecutionContext with start(), stop(), status(), logs(). Two implementations: LocalContext (asyncio subprocess) and DockerContext (docker-py SDK). Adding a new context (Podman, containerd, SSH remote) means implementing four methods.

What this cost: Docker container state and subprocess state don’t map cleanly. A container can be “created but not started” — there’s no subprocess equivalent. We had to normalize states across contexts, which means some Docker-specific information gets flattened. The linux_state field only exists for local processes. Trade-off accepted — the unified interface is worth more than perfect Docker fidelity.


Decision 5: Health Probes with Dependency Ordering

This is where process management gets actually useful for dev environments. Starting 5 services in the right order, waiting for each to be ready before starting the next — that’s the thing you do manually every morning.

processes:
  worker:
    command: celery worker
    depends_on:
      - name: api
        condition: healthy    # not just "started"

Two dependency conditions: started (process is running) and healthy (health check passes). The difference matters. A database can be “started” (PID exists) but not “healthy” (still replaying WAL). Starting the API before the database accepts connections leads to the kind of error that wastes 10 minutes of debugging.

Group start walks the dependency graph, starts processes in topological order, and waits for each dependency condition before proceeding. Group stop runs in reverse.


The Test Suite

320 tests, all passing. The test structure mirrors the architecture:

AreaTestsWhat’s Covered
CLI45Every command, JSON output shape, error codes
Process Manager62Start/stop/restart, idempotency, state transitions
Config38YAML loading, validation, explain, export/import
Groups28Ordered start/stop, dependency resolution
Recipes24Step execution, dry-run, error handling
Health18Probe execution, dependency conditions
API52REST endpoints, WebSocket subscriptions
Docker15Container operations, context switching
TUI13Widget creation, helpers, data loading
Other25Snippets, scheduler, replicas, namespaces

The test philosophy: every test asserts exact values, not existence. assert result["status"] == "running", not assert result is not None. If a test can pass when the code is broken, the test is broken.


What Didn’t Work

Honest accounting of decisions that didn’t pan out:

  1. The suggestion field is underused. In practice, most agents ignore it and make their own decisions based on error_code. The field is still there, but it’s not the game-changer we expected.

  2. WebSocket subscriptions are complex for agents. Most LLM agents work in request/response cycles. Maintaining a WebSocket connection for real-time updates requires a different programming model that most agent frameworks don’t support well. Polling procler status every few seconds is simpler and works fine.

  3. YAML config is a mixed blessing. Human-readable and version-controllable, but YAML’s gotchas (implicit type coercion, indentation sensitivity) cause config errors that are hard to debug. Pydantic validation catches most of them, but the error messages reference Pydantic internals, not the YAML line that’s wrong.

  4. TUI needs more work. The Textual-based TUI works for monitoring but the start/stop/restart actions aren’t as battle-tested as the CLI. It’s a v1 — functional but not polished.


Getting Started

pip install procler               # or: uv add procler
procler config init               # creates .procler/config.yaml
procler capabilities              # see what's available

The source is at github.com/gabu-quest/procler. Python 3.12+, 320 tests, MIT license.