LLM Manifesto, March 2026 (Subject to Change)
I don't care about AI. I care about outcomes. A process owner's beliefs about agents, autonomy, and where the bottleneck actually moved.
Thoughts on web development, design, and technology.
I don't care about AI. I care about outcomes. A process owner's beliefs about agents, autonomy, and where the bottleneck actually moved.
I built a browser-based SSH terminal manager because juggling tmux sessions across four machines via Alt-Tab was making me type commands into the wrong box. Then I started routing LLM agents through it.
Integration testing five SQLite-backed Python libraries through their public APIs. 16 bugs found and fixed at the source. What the seams between libraries actually look like under chaos.
Parts 1–4 measured ORM overhead: sqler vs raw sqlite, both using JSON storage. This post asks a different question. What does the document-oriented architecture itself cost? Equality filter: 11x. Aggregates: 9.5x. JSONL export: 1.0x.
Hierarchy building from 86 seconds to 332ms (259x). Search from 3.77s to 31ms (121x). Memory from 83 MB to 1 MB (83x). Two measurement bugs caught before they shipped.
qler vs Celery+Redis — 7 scenarios, 3 rounds, and one embarrassing 12x gap we found by benchmarking ourselves honestly.
1,725 measurements, 4 scales, 10.5 hours. Bulk insert 0.89x (faster than raw sqlite). FTS ranked 1.00x. Everything else ≤1.15x. One irreducible gap at 1.34x.
FTS rebuild from 4.65x to 1.03x (benchmark bug). Bulk insert from 1.9x to 0.89x (faster than raw sqlite). FTS ranked from 1.50x to 1.00x (single JOIN). Plus msgspec at 5.1x hydration.
Fair benchmarks pointed at the real bottleneck: Pydantic validation at 1,600ns/row. Bypass it for exports and the 2.8x gap drops to parity. The msgspec question remains open.
We wrote 22 benchmarks; every one was biased in our favor. An adversarial audit found 18 fairness issues. The rewrite made every number worse but useful.