Part 1 found 18 fairness issues and rewrote the methodology. Part 2 traced the overhead to Pydantic and fixed exports. Part 3 fixed four more bottlenecks across five milestones. This post runs the full suite one last time.
Part 4 of 5. Part 1: The Methodology | Part 2: The Overhead | Part 3: The Fixes | Part 5: The Columnar Baseline
What We Ran
24 scenarios, 20 iterations each, 3 warmup, --storage both, arm alternation, GC isolation. Four scales:
| Scale | Measurements | Runtime |
|---|---|---|
| Medium (50K) | 435 | ~18 min |
| Large (100K) | 432 | ~40 min |
| Xlarge (500K) | 429 | ~3 hrs |
| Xxlarge (1M) | 429 | ~6.5 hrs |
| Total | 1,725 | ~10.5 hrs |
Combined with the pre-optimization validation: 3,288 total measurements across the series.
Environment: Python 3.12.11 | sqler 1.2026.3.5 | SQLite 3.50.4 | Linux x86_64 (8 cores)
The Scorecard

At 1M rows, disk mode:
| Category | Ratio | Verdict |
|---|---|---|
| Bulk insert | 0.89x | Faster than raw sqlite |
| Export JSON | 0.97x | At parity |
| JSONL (no ID) | 0.98x | At parity |
| FTS ranked | 1.00x | Perfect parity |
| Backup/restore | 1.01x | Transparent |
| FTS rebuild | 1.03x | Near parity |
| any_where | 1.04x | Near parity |
| Export JSONL | 1.06x | Near parity |
| Queries | 1.03–1.15x | Stable overhead |
| Aggregates | 1.06–1.13x | Stable overhead |
| JSON ops | 1.07–1.14x | Stable overhead |
| Export CSV | 1.34x | Irreducible |
Everything ≤1.15x except CSV export. The query layer adds 5–12% overhead; that’s the cost of query compilation, the adapter layer, and the row factory artifact documented in Part 1. None of it grows with scale, which matters more than the percentage itself.
Every Optimization Holds
The headline from the cross-scale validation: no regressions at any scale.
| Optimization | 50K | 100K | 500K | 1M | Pre-fix |
|---|---|---|---|---|---|
| Bulk insert (M-3) | 0.96x | 0.91x | 0.91x | 0.89x | 1.87–1.92x |
| any_where (M-2) | 1.02x | 1.03x | 1.05x | 1.04x | 1.47–1.51x |
| FTS rebuild (M-1) | 1.04x | 1.02x | 1.07x | 1.03x | 3.78–4.65x |
| FTS ranked (M-4) | 0.98x | 1.01x | 1.00x | 1.00x | 1.50x at 500K+ |
| Hydration (M-5) | 5.15x | 4.62x | 5.01x | 4.97x | (new capability) |
Bulk insert gets more efficient at scale: 0.96x at 50K → 0.89x at 1M. The multi-row INSERT pattern’s parser savings compound as row count grows; 50 SQL calls versus 1M individual statements.
FTS ranked is the more important validation. At 1M rows: 1,037ms vs 1,033ms. This was the only scenario that previously worsened at scale (from 0.95x at 50K to 1.50x at 1M); the single-JOIN fix from Part 3 eliminated the regression entirely.
Ratio Stability Across Scale

The ratios are flat lines. Bulk insert improves slightly at scale; everything else holds within measurement noise. This means the overhead is proportional: per-row costs, not algorithmic. The ORM doesn’t introduce complexity that grows faster than the underlying SQL work.
A note on query ratios: Part 1’s v1.2 data showed queries at 0.95–0.97x in memory mode, with sqler apparently faster than raw sqlite3. The final suite measures disk mode, where the row factory artifact documented in Part 1 no longer masks the true overhead. The corrected numbers are 1.03–1.15x; this is the real cost of query compilation and adapter wrapping, not a regression introduced by the optimization pass.
Detailed breakdowns (disk mode)
Queries:
| Type | 50K | 100K | 500K | 1M |
|---|---|---|---|---|
| Equality filter (no index) | 1.13x | 1.21x | 1.24x | 1.14x |
| Range 50% selectivity | 1.15x | 1.15x | 1.15x | 1.11x |
| Complex 5-predicate | 1.03x | 1.06x | 1.04x | 1.08x |
| Top-N (1000) | 1.12x | 1.11x | 1.12x | 1.11x |
The unindexed equality filter shows the highest variance (1.13–1.24x across scales); complex queries with more predicates show less overhead because SQLite spends more time evaluating conditions relative to sqler’s fixed per-query cost.
JSON operations:
| Type | 50K | 100K | 500K | 1M |
|---|---|---|---|---|
| Array contains | 1.05x | 1.05x | 1.05x | 1.09x |
| Array isin | 1.01x | 1.03x | 1.02x | 1.07x |
| Nested field (depth 3) | 1.13x | 1.12x | 1.12x | 1.14x |
Nested field access at depth 3 carries a consistent ~13% overhead; the flat array operations are closer to parity.
Aggregates:
| Type | 50K | 100K | 500K | 1M |
|---|---|---|---|---|
| sum | 1.17x | 1.15x | 1.17x | 1.13x |
| avg | 1.19x | 1.13x | 1.17x | 1.13x |
| min | 1.08x | 1.16x | 1.15x | 1.09x |
| max | 1.11x | 1.07x | 1.12x | 1.06x |
Aggregates converge toward parity at scale; at 1M rows, every aggregate is under 1.13x.
Absolute Wall-Clock Cost at 1M Rows
Ratios tell you proportions; absolute numbers tell you whether to care.
| Operation | sqler | sqlite | Overhead |
|---|---|---|---|
| FTS rebuild | 32.3s | 31.4s | +0.9s |
| Bulk insert | 7.0s | 7.9s | −0.9s |
| any_where | 6.5s | 6.2s | +0.3s |
| Export CSV | 10.3s | 7.7s | +2.6s |
| Complex 5-pred query | 2.4s | 2.2s | +0.2s |
| FTS ranked | 1.04s | 1.03s | +0.01s |
| Equality filter (no idx) | 1.2s | 1.1s | +0.1s |
| Backup | 614ms | 610ms | +4ms |
The largest absolute penalty is CSV export: 2.6 extra seconds on 1M rows for per-field extraction overhead. Everything else is under 1 second of additional cost on a million rows. For queries that return hundreds or thousands of rows (most real workloads), the overhead is single-digit milliseconds.
Bulk insert saves 0.9 seconds at 1M by using chunked multi-row INSERT instead of executemany().
What’s Irreducible
CSV export (1.34x): Per-field extraction from JSON dicts plus _serialize_value() per field. Both arms parse JSON; sqler’s overhead is the dict-to-row conversion that CSV format requires. This could shrink with a C-level JSON-to-CSV converter, but in Python it’s structural.
Query overhead (5–12%): Query compilation, adapter wrapping, and the row factory artifact. At 1M rows with a 5-predicate complex query, that’s 200ms of absolute overhead. For indexed lookups and top-N queries, the absolute overhead is under 10ms; whether that matters depends on whether 10ms matters to your use case.
Aggregate overhead (6–13%): Same sources as query overhead. At 1M rows, the most expensive aggregate (sum) takes 1.13x or about 100ms extra.
The msgspec Read Path
M-5 added SQLerMsgspecModel as an opt-in alternative to SQLerLiteModel. For read-heavy workloads where hydration matters:
| Path | Speed vs Lite default |
|---|---|
queryset.all() with SQLerLiteModel | 1.0x (baseline) |
queryset.all() with SQLerMsgspecModel | 1.46x faster |
queryset.as_dicts() | ~2.8x faster |
The as_dicts() path returns raw dicts with no hydration, fastest when you don’t need model instances. The msgspec path gives full type safety at 1.46x the speed of dataclasses. Details in Part 3.
Hydration repeatability (4 independent runs)
| Run context | Pure validate | End-to-end all() (disk) |
|---|---|---|
| Medium suite | 5.15x | 1.47x |
| Large suite | 4.62x | 1.43x |
| Xlarge suite | 5.01x | 1.46x |
| Xxlarge suite | 4.97x | 1.43x |
The hydration benchmark uses fixed 50K rows regardless of the scale parameter. Running it as part of each scale’s full suite provides four independent measurements; the ratios are highly repeatable.
Known Caveats
-
Row factory artifact (3–6%): The baseline uses
sqlite3.Rowwith string key access; sqler uses integer index access. This consistently makes sqler appear ~5% faster on reads, which it isn’t. Documented in Part 1. -
Single machine: All measurements on one machine (Linux x86_64, 8 cores). Different hardware, OS, or Python versions may shift ratios.
-
No update/delete benchmarks: The suite measures insert and query extensively but doesn’t benchmark
update()ordelete()at scale. -
No concurrent write contention: Optimistic locking is tested for correctness, not throughput under contention.
-
100K CSV anomaly: One 100K run showed 0.32x for CSV export (the sqlite baseline took 2,979ms instead of the expected ~716ms). Environmental outlier; not reproducible at other scales. Excluded from trend analysis.
Running It Yourself
# Install with benchmark dependencies
uv sync --all-groups
# Run at medium scale (~20 minutes, memory + disk)
uv run python -m benchmarks run --scale medium --storage both
# Run at large scale (~40 minutes)
uv run python -m benchmarks run --scale large --storage both
# Generate charts from latest results
uv run --group benchmarks python -m benchmarks plot
# List all scenarios
uv run python -m benchmarks list
The benchmark suite enforces matched PRAGMAs, arm alternation, and GC isolation automatically. Results are saved as JSON in benchmarks/results/.
The Journey
This series started with 22 benchmarks that all told the same story: sqler is fast. That story was wrong, or rather, it was the answer to a question nobody asked. “Is sqler fast with better configuration than the baseline?” is not a useful question.
The adversarial audit found 23 fairness issues across four rounds. The fair rewrite made every number worse. The optimization pass fixed five real bottlenecks, two of which turned out to be benchmark bugs, not code bugs.
The final numbers: bulk insert 0.89x (faster than raw), FTS ranked 1.00x (perfect parity), everything else ≤1.15x, one irreducible gap at 1.34x. Confirmed at 50K, 100K, 500K, and 1M rows across 1,725 measurements and 10.5 hours of runtime.
Whether these numbers are good enough depends on what you’re building. For most workloads (the kind where you’re querying a few thousand rows and exporting results), the overhead is under 10ms in absolute terms. For million-row bulk operations, sqler is at parity or faster than the naive executemany() approach. The ORM layer is not free, but the cost is stable, predictable, and small relative to the SQL work underneath it.
Start from the beginning: Part 1: The Methodology.
