Each library in the -ler stack has its own test suite. sqler tests its ORM; qler tests its job queue; logler tests its log search; procler tests its process manager; dagler tests its DAG orchestrator. Every one of them passes in isolation.
proofler asks the harder question: do they actually work together?
What proofler tests
The suite runs 41 sections and 430+ checks through the real, installed libraries. No mocks, no stubs, no simulated interfaces. Every test calls the public API of at least two libraries; most involve three or four.
The layers under test:
| Library | Role |
|---|---|
| sqler | SQLite ORM with async, connection pool, optimistic locking |
| qler | Background job queue with workers, cron, rate limiting, dependencies |
| logler | Log investigation with Rust backend, correlation IDs, DB bridge |
| procler | Process management with health checks, crash detection, recovery |
| dagler | DAG pipeline orchestration with fan-out/reduce, retry, cancellation |
All five backed by SQLite. No Postgres, no Redis, no external dependencies.
The philosophy
One rule: fix bugs at the source.
When proofler catches a bug, we don’t wrap it in try/except, skip the test, or add a fallback. We fix it in the library that owns the bug. The test stays asserting the correct behavior; the library ships the fix.
Over eight milestones, this approach found and fixed 16 bugs across the stack:
| Library | Bugs found | Examples |
|---|---|---|
| sqler | 3 | Connection pool exhaustion, count aggregation, promoted column handling |
| qler | 4 | Connection leaks in dependency resolution, stale attempt IDs, batch dedup |
| logler | 5 | Wrong DB mappings, empty __init__.py shadowing Rust extension, search OOM, SQL engine O(N²) |
| dagler | 3 | Cancel missing dynamic fan-out jobs, schema timing, cancel status persistence |
| procler | 1 | DDL operations using read-only query path |
Every fix has a regression test that fails if the bug returns. The fixes aren’t theoretical; they shipped to the library repos and are verified on every proofler run.
What the seams look like
The interesting bugs aren’t inside individual libraries. They’re at the boundaries.
logler + qler: schema assumptions
logler’s db_to_jsonl() auto-detects tables in a SQLite database and converts them to JSONL for investigation. It assumed every table has an _id column (sqler’s convention). qler’s qler_job_deps table doesn’t; it’s a pure relational join table. Auto-detection crashed.
The fix: skip tables without _id during auto-detection. A one-line change in logler that would never surface in logler’s own tests because logler’s tests don’t use qler’s schema.
procler + sqler: read-only enforcement
sqler’s execute_sql() method validates that queries are read-only (SELECT, EXPLAIN, PRAGMA, WITH). procler used execute_sql() for DDL operations (CREATE TABLE, ALTER TABLE) during database initialization. This worked in procler’s tests because they used a different database setup path; it broke the moment procler initialized through sqler’s real API.
The fix: procler now uses adapter.execute() for DDL, which bypasses the read-only check. A legitimate use of the lower-level API that sqler exposes for exactly this purpose.
dagler + qler: dynamic job cancellation
dagler’s cancel() iterates the jobs created at submission time. But fan-out DAGs create jobs dynamically through a dispatcher; those jobs aren’t in the original submission list. Cancelling a running fan-out left 20 map jobs still pending in the database.
The fix required dagler to query qler’s job table by correlation ID after cancelling known jobs, picking up any dynamically-created work. A cross-library interaction that only surfaces when you run fan-out under real cancellation pressure.
The order pipeline: 600 jobs under chaos
Section 13 runs a full order-processing pipeline: 100 orders, each spawning 6 tasks (validate, fraud check, charge, confirm, inventory, warehouse) with dependency chains. Seeded chaos injects failures at realistic rates: 2.5% validation failures, 4.5% fraud rejections, 11.2% charge declines.
Two workers process the pipeline. Midway through, Worker A shuts down; qler’s lease recovery detects the expired leases and Worker B picks up the abandoned jobs. Every order either completes or fails with a traceable reason.
logler traces the full lifecycle: per-order correlation IDs thread through both log files and the qler database. A failed charge cascades to cancel downstream tasks; logler’s Investigator can diagnose exactly which step failed, why, and which downstream jobs were affected.
27 checks in this section alone. Zero tolerance for data loss or untraceable failures.
procler: real OS subprocess management
Sections 14-19 test procler managing qler workers as actual OS subprocesses. Not mock processes; real asyncio.create_subprocess_exec calls with PIDs, signals, and exit codes.
Health checks transition through STARTING → HEALTHY → DEAD. Crash detection sends SIGKILL to a running worker, verifies procler detects the unexpected exit, then confirms a replacement worker recovers the abandoned jobs. CID tracing verifies that correlation IDs thread through procler’s management layer into qler’s job execution and logler’s investigation output.
The full stack roundtrip in Section 19: procler starts a worker → qler processes 20 jobs → logler traces everything → proofler verifies the correlation chain from procler’s spawn event through qler’s job completion to logler’s search results.
dagler: DAG orchestration at scale
Sections 25-41 push dagler from basic linear pipelines through diamond DAGs, fan-out/reduce, concurrent runs, and operational edge cases.
The scale tests are where things get interesting. Fan-out throughput follows a clear curve:
| Fan-out size | Elapsed | items/s |
|---|---|---|
| 100 items | ~1s | ~90 |
| 500 items | ~4s | ~120 |
| 1,000 items | ~8s | ~120 |
| 5,000 items | ~70s | 72 |
Throughput peaks at 500-1K items, then drops at 5K due to WAL write pressure from 5,000 map jobs all writing results simultaneously. This is a documented characteristic, not a bug; the test verifies the exact reduce result is correct regardless of throughput.
The operational edge cases (S36-S41) test cancel-mid-flight, retry with map-level failures, idempotent submission dedup, wait() timeout recovery, and concurrent fan-out contention. These are the scenarios that break pipelines in production; proofler catches the failure modes before they ship.
Stress testing: the throughput numbers
stress.py runs an 8-task dependency pipeline per order with chaos injection, worker churn, zombie recovery, and abrupt kills. Throughput numbers from this box:
| Scale | Config | Throughput | p50 | p95 | p99 | Peak RSS | Checks |
|---|---|---|---|---|---|---|---|
| Scale | Config | Throughput | p50 | p95 | p99 | Peak RSS | Checks |
| ------- | -------- | ----------- | ----- | ----- | ----- | ---------- | -------- |
| 200 orders (1,600 jobs) | 1w × c=2 | 860 jobs/min | 50ms | 105ms | 133ms | 62 MB | 50/50 |
| 1,000 orders (8,000 jobs) | 1w × c=4 | 2,145 jobs/min | 60ms | 139ms | 186ms | 152 MB | 50/50 |
| 10,000 orders (80,000 jobs) | 2w × c=1 | 1,245 jobs/min | 73ms | 144ms | 204ms | 1,370 MB | 50/50 |
All runs use seed=42 for reproducibility. Chaos rates: 2.5% validate failure, 4.5% fraud failure, 11.2% charge failure. Worker churn includes graceful shutdown + restart and one abrupt SIGKILL per run at the 10K scale.
Throughput increases from 200 to 1,000 orders; the batch claim pipeline stays fuller at larger scale, and the reverse dependency index keeps resolution at O(1) regardless of table size. At 10K, throughput drops to 1,245 jobs/min — the WAL file grows with 80K active rows, and SQLite’s single-writer constraint starts to bite. Processing RSS held steady at 1,370 MB across 55+ worker cycles with zero memory growth; the 2.6 GB peak was entirely logler’s analysis phase reading 600K+ log entries.
213 zombie jobs injected at 10K, 213 recovered. Zero data loss across all configurations. Every failed order traceable through logler.
What proofler is not
It’s not a library. You can’t pip install proofler. It’s a test harness that requires local checkouts of all five -ler libraries, symlinked into a deps/ directory. The setup instructions take about 30 seconds if you already have the repos cloned.
It’s not a CI system. The suite takes 6-10 minutes to run depending on hardware, and some sections (5K fan-out, 10K fan-out) are deliberately slow to test performance cliffs. It’s designed for development-time validation, not automated gates.
What it is: proof that five independently-developed libraries, sharing nothing but SQLite files and Python’s import system, compose into a working stack. The bugs it finds are the ones that unit tests structurally cannot catch.
Source
github.com/gabu-quest/proofler — 41 sections, 430+ checks, zero workarounds.