proofler: 430 Checks Across Five Libraries, Zero Workarounds
Integration testing five SQLite-backed Python libraries through their public APIs. 16 bugs found and fixed at the source. What the seams between libraries actually look like under chaos.
9 posts
Integration testing five SQLite-backed Python libraries through their public APIs. 16 bugs found and fixed at the source. What the seams between libraries actually look like under chaos.
Parts 1–4 measured ORM overhead: sqler vs raw sqlite, both using JSON storage. This post asks a different question. What does the document-oriented architecture itself cost? Equality filter: 11x. Aggregates: 9.5x. JSONL export: 1.0x.
Hierarchy building from 86 seconds to 332ms (259x). Search from 3.77s to 31ms (121x). Memory from 83 MB to 1 MB (83x). Two measurement bugs caught before they shipped.
qler vs Celery+Redis — 7 scenarios, 3 rounds, and one embarrassing 12x gap we found by benchmarking ourselves honestly.
1,725 measurements, 4 scales, 10.5 hours. Bulk insert 0.89x (faster than raw sqlite). FTS ranked 1.00x. Everything else ≤1.15x. One irreducible gap at 1.34x.
FTS rebuild from 4.65x to 1.03x (benchmark bug). Bulk insert from 1.9x to 0.89x (faster than raw sqlite). FTS ranked from 1.50x to 1.00x (single JOIN). Plus msgspec at 5.1x hydration.
Fair benchmarks pointed at the real bottleneck: Pydantic validation at 1,600ns/row. Bypass it for exports and the 2.8x gap drops to parity. The msgspec question remains open.
We wrote 22 benchmarks; every one was biased in our favor. An adversarial audit found 18 fairness issues. The rewrite made every number worse but useful.
Three targeted optimizations cut hierarchy building from 86 seconds to 349ms. A BTreeSet prefix index, cached investigators, and capped sampling — with before/after measurements proving every claim.