sqler: The Final Scorecard

March 5, 2026

Part 1 found 18 fairness issues and rewrote the methodology. Part 2 traced the overhead to Pydantic and fixed exports. Part 3 fixed four more bottlenecks across five milestones. This post runs the full suite one last time.

Part 4 of 5. Part 1: The Methodology | Part 2: The Overhead | Part 3: The Fixes | Part 5: The Columnar Baseline

What We Ran

24 scenarios, 20 iterations each, 3 warmup, --storage both, arm alternation, GC isolation. Four scales:

Scale	Measurements	Runtime
Medium (50K)	435	~18 min
Large (100K)	432	~40 min
Xlarge (500K)	429	~3 hrs
Xxlarge (1M)	429	~6.5 hrs
Total	1,725	~10.5 hrs

Combined with the pre-optimization validation: 3,288 total measurements across the series.

Environment: Python 3.12.11 | sqler 1.2026.3.5 | SQLite 3.50.4 | Linux x86_64 (8 cores)

The Scorecard

Final scorecard

At 1M rows, disk mode:

Category	Ratio	Verdict
Bulk insert	0.89x	Faster than raw sqlite
Export JSON	0.97x	At parity
JSONL (no ID)	0.98x	At parity
FTS ranked	1.00x	Perfect parity
Backup/restore	1.01x	Transparent
FTS rebuild	1.03x	Near parity
any_where	1.04x	Near parity
Export JSONL	1.06x	Near parity
Queries	1.03–1.15x	Stable overhead
Aggregates	1.06–1.13x	Stable overhead
JSON ops	1.07–1.14x	Stable overhead
Export CSV	1.34x	Irreducible

Everything ≤1.15x except CSV export. The query layer adds 5–12% overhead; that’s the cost of query compilation, the adapter layer, and the row factory artifact documented in Part 1. None of it grows with scale, which matters more than the percentage itself.

Every Optimization Holds

The headline from the cross-scale validation: no regressions at any scale.

Optimization	50K	100K	500K	1M	Pre-fix
Bulk insert (M-3)	0.96x	0.91x	0.91x	0.89x	1.87–1.92x
any_where (M-2)	1.02x	1.03x	1.05x	1.04x	1.47–1.51x
FTS rebuild (M-1)	1.04x	1.02x	1.07x	1.03x	3.78–4.65x
FTS ranked (M-4)	0.98x	1.01x	1.00x	1.00x	1.50x at 500K+
Hydration (M-5)	5.15x	4.62x	5.01x	4.97x	(new capability)

Bulk insert gets more efficient at scale: 0.96x at 50K → 0.89x at 1M. The multi-row INSERT pattern’s parser savings compound as row count grows; 50 SQL calls versus 1M individual statements.

FTS ranked is the more important validation. At 1M rows: 1,037ms vs 1,033ms. This was the only scenario that previously worsened at scale (from 0.95x at 50K to 1.50x at 1M); the single-JOIN fix from Part 3 eliminated the regression entirely.

Ratio Stability Across Scale

Ratio stability

The ratios are flat lines. Bulk insert improves slightly at scale; everything else holds within measurement noise. This means the overhead is proportional: per-row costs, not algorithmic. The ORM doesn’t introduce complexity that grows faster than the underlying SQL work.

A note on query ratios: Part 1’s v1.2 data showed queries at 0.95–0.97x in memory mode, with sqler apparently faster than raw sqlite3. The final suite measures disk mode, where the row factory artifact documented in Part 1 no longer masks the true overhead. The corrected numbers are 1.03–1.15x; this is the real cost of query compilation and adapter wrapping, not a regression introduced by the optimization pass.

Detailed breakdowns (disk mode)

Queries:

Type	50K	100K	500K	1M
Equality filter (no index)	1.13x	1.21x	1.24x	1.14x
Range 50% selectivity	1.15x	1.15x	1.15x	1.11x
Complex 5-predicate	1.03x	1.06x	1.04x	1.08x
Top-N (1000)	1.12x	1.11x	1.12x	1.11x

The unindexed equality filter shows the highest variance (1.13–1.24x across scales); complex queries with more predicates show less overhead because SQLite spends more time evaluating conditions relative to sqler’s fixed per-query cost.

JSON operations:

Type	50K	100K	500K	1M
Array contains	1.05x	1.05x	1.05x	1.09x
Array isin	1.01x	1.03x	1.02x	1.07x
Nested field (depth 3)	1.13x	1.12x	1.12x	1.14x

Nested field access at depth 3 carries a consistent ~13% overhead; the flat array operations are closer to parity.

Aggregates:

Type	50K	100K	500K	1M
sum	1.17x	1.15x	1.17x	1.13x
avg	1.19x	1.13x	1.17x	1.13x
min	1.08x	1.16x	1.15x	1.09x
max	1.11x	1.07x	1.12x	1.06x

Aggregates converge toward parity at scale; at 1M rows, every aggregate is under 1.13x.

Absolute Wall-Clock Cost at 1M Rows

Ratios tell you proportions; absolute numbers tell you whether to care.

Operation	sqler	sqlite	Overhead
FTS rebuild	32.3s	31.4s	+0.9s
Bulk insert	7.0s	7.9s	−0.9s
any_where	6.5s	6.2s	+0.3s
Export CSV	10.3s	7.7s	+2.6s
Complex 5-pred query	2.4s	2.2s	+0.2s
FTS ranked	1.04s	1.03s	+0.01s
Equality filter (no idx)	1.2s	1.1s	+0.1s
Backup	614ms	610ms	+4ms

The largest absolute penalty is CSV export: 2.6 extra seconds on 1M rows for per-field extraction overhead. Everything else is under 1 second of additional cost on a million rows. For queries that return hundreds or thousands of rows (most real workloads), the overhead is single-digit milliseconds.

Bulk insert saves 0.9 seconds at 1M by using chunked multi-row INSERT instead of executemany().

What’s Irreducible

CSV export (1.34x): Per-field extraction from JSON dicts plus _serialize_value() per field. Both arms parse JSON; sqler’s overhead is the dict-to-row conversion that CSV format requires. This could shrink with a C-level JSON-to-CSV converter, but in Python it’s structural.

Query overhead (5–12%): Query compilation, adapter wrapping, and the row factory artifact. At 1M rows with a 5-predicate complex query, that’s 200ms of absolute overhead. For indexed lookups and top-N queries, the absolute overhead is under 10ms; whether that matters depends on whether 10ms matters to your use case.

Aggregate overhead (6–13%): Same sources as query overhead. At 1M rows, the most expensive aggregate (sum) takes 1.13x or about 100ms extra.

The msgspec Read Path

M-5 added SQLerMsgspecModel as an opt-in alternative to SQLerLiteModel. For read-heavy workloads where hydration matters:

Path	Speed vs Lite default
`queryset.all()` with `SQLerLiteModel`	1.0x (baseline)
`queryset.all()` with `SQLerMsgspecModel`	1.46x faster
`queryset.as_dicts()`	~2.8x faster

The as_dicts() path returns raw dicts with no hydration, fastest when you don’t need model instances. The msgspec path gives full type safety at 1.46x the speed of dataclasses. Details in Part 3.

Hydration repeatability (4 independent runs)

Run context	Pure validate	End-to-end all() (disk)
Medium suite	5.15x	1.47x
Large suite	4.62x	1.43x
Xlarge suite	5.01x	1.46x
Xxlarge suite	4.97x	1.43x

The hydration benchmark uses fixed 50K rows regardless of the scale parameter. Running it as part of each scale’s full suite provides four independent measurements; the ratios are highly repeatable.

Known Caveats

Row factory artifact (3–6%): The baseline uses sqlite3.Row with string key access; sqler uses integer index access. This consistently makes sqler appear ~5% faster on reads, which it isn’t. Documented in Part 1.
Single machine: All measurements on one machine (Linux x86_64, 8 cores). Different hardware, OS, or Python versions may shift ratios.
No update/delete benchmarks: The suite measures insert and query extensively but doesn’t benchmark update() or delete() at scale.
No concurrent write contention: Optimistic locking is tested for correctness, not throughput under contention.
100K CSV anomaly: One 100K run showed 0.32x for CSV export (the sqlite baseline took 2,979ms instead of the expected ~716ms). Environmental outlier; not reproducible at other scales. Excluded from trend analysis.

Running It Yourself

# Install with benchmark dependencies
uv sync --all-groups

# Run at medium scale (~20 minutes, memory + disk)
uv run python -m benchmarks run --scale medium --storage both

# Run at large scale (~40 minutes)
uv run python -m benchmarks run --scale large --storage both

# Generate charts from latest results
uv run --group benchmarks python -m benchmarks plot

# List all scenarios
uv run python -m benchmarks list

The benchmark suite enforces matched PRAGMAs, arm alternation, and GC isolation automatically. Results are saved as JSON in benchmarks/results/.

The Journey

This series started with 22 benchmarks that all told the same story: sqler is fast. That story was wrong, or rather, it was the answer to a question nobody asked. “Is sqler fast with better configuration than the baseline?” is not a useful question.

The adversarial audit found 23 fairness issues across four rounds. The fair rewrite made every number worse. The optimization pass fixed five real bottlenecks, two of which turned out to be benchmark bugs, not code bugs.

The final numbers: bulk insert 0.89x (faster than raw), FTS ranked 1.00x (perfect parity), everything else ≤1.15x, one irreducible gap at 1.34x. Confirmed at 50K, 100K, 500K, and 1M rows across 1,725 measurements and 10.5 hours of runtime.

Whether these numbers are good enough depends on what you’re building. For most workloads (the kind where you’re querying a few thousand rows and exporting results), the overhead is under 10ms in absolute terms. For million-row bulk operations, sqler is at parity or faster than the naive executemany() approach. The ORM layer is not free, but the cost is stable, predictable, and small relative to the SQL work underneath it.

Start from the beginning: Part 1: The Methodology.