proofler: 430 Checks Across Five Libraries, Zero Workarounds
Integration testing five SQLite-backed Python libraries through their public APIs. 16 bugs found and fixed at the source. What the seams between libraries actually look like under chaos.
8 posts
Integration testing five SQLite-backed Python libraries through their public APIs. 16 bugs found and fixed at the source. What the seams between libraries actually look like under chaos.
Parts 1–4 measured ORM overhead: sqler vs raw sqlite, both using JSON storage. This post asks a different question. What does the document-oriented architecture itself cost? Equality filter: 11x. Aggregates: 9.5x. JSONL export: 1.0x.
Hierarchy building from 86 seconds to 332ms (259x). Search from 3.77s to 31ms (121x). Memory from 83 MB to 1 MB (83x). Two measurement bugs caught before they shipped.
qler vs Celery+Redis — 7 scenarios, 3 rounds, and one embarrassing 12x gap we found by benchmarking ourselves honestly.
1,725 measurements, 4 scales, 10.5 hours. Bulk insert 0.89x (faster than raw sqlite). FTS ranked 1.00x. Everything else ≤1.15x. One irreducible gap at 1.34x.
Fair benchmarks pointed at the real bottleneck: Pydantic validation at 1,600ns/row. Bypass it for exports and the 2.8x gap drops to parity. The msgspec question remains open.
We wrote 22 benchmarks; every one was biased in our favor. An adversarial audit found 18 fairness issues. The rewrite made every number worse but useful.
What changes when your primary user is an LLM. Structured JSON output, self-describing commands, idempotent operations, and a config system that explains itself — design decisions from building an LLM-first process manager.