Shows how much of the article you have read

sqler: The Final Scorecard

Part 1 found 18 fairness issues and rewrote the methodology. Part 2 traced the overhead to Pydantic and fixed exports. Part 3 fixed four more bottlenecks across five milestones. This post runs the full suite one last time.

Part 4 of 5. Part 1: The Methodology | Part 2: The Overhead | Part 3: The Fixes | Part 5: The Columnar Baseline


What We Ran

24 scenarios, 20 iterations each, 3 warmup, --storage both, arm alternation, GC isolation. Four scales:

ScaleMeasurementsRuntime
Medium (50K)435~18 min
Large (100K)432~40 min
Xlarge (500K)429~3 hrs
Xxlarge (1M)429~6.5 hrs
Total1,725~10.5 hrs

Combined with the pre-optimization validation: 3,288 total measurements across the series.

Environment: Python 3.12.11 | sqler 1.2026.3.5 | SQLite 3.50.4 | Linux x86_64 (8 cores)


The Scorecard

Final scorecard

At 1M rows, disk mode:

CategoryRatioVerdict
Bulk insert0.89xFaster than raw sqlite
Export JSON0.97xAt parity
JSONL (no ID)0.98xAt parity
FTS ranked1.00xPerfect parity
Backup/restore1.01xTransparent
FTS rebuild1.03xNear parity
any_where1.04xNear parity
Export JSONL1.06xNear parity
Queries1.03–1.15xStable overhead
Aggregates1.06–1.13xStable overhead
JSON ops1.07–1.14xStable overhead
Export CSV1.34xIrreducible

Everything ≤1.15x except CSV export. The query layer adds 5–12% overhead; that’s the cost of query compilation, the adapter layer, and the row factory artifact documented in Part 1. None of it grows with scale, which matters more than the percentage itself.


Every Optimization Holds

The headline from the cross-scale validation: no regressions at any scale.

Optimization50K100K500K1MPre-fix
Bulk insert (M-3)0.96x0.91x0.91x0.89x1.87–1.92x
any_where (M-2)1.02x1.03x1.05x1.04x1.47–1.51x
FTS rebuild (M-1)1.04x1.02x1.07x1.03x3.78–4.65x
FTS ranked (M-4)0.98x1.01x1.00x1.00x1.50x at 500K+
Hydration (M-5)5.15x4.62x5.01x4.97x(new capability)

Bulk insert gets more efficient at scale: 0.96x at 50K → 0.89x at 1M. The multi-row INSERT pattern’s parser savings compound as row count grows; 50 SQL calls versus 1M individual statements.

FTS ranked is the more important validation. At 1M rows: 1,037ms vs 1,033ms. This was the only scenario that previously worsened at scale (from 0.95x at 50K to 1.50x at 1M); the single-JOIN fix from Part 3 eliminated the regression entirely.


Ratio Stability Across Scale

Ratio stability

The ratios are flat lines. Bulk insert improves slightly at scale; everything else holds within measurement noise. This means the overhead is proportional: per-row costs, not algorithmic. The ORM doesn’t introduce complexity that grows faster than the underlying SQL work.

A note on query ratios: Part 1’s v1.2 data showed queries at 0.95–0.97x in memory mode, with sqler apparently faster than raw sqlite3. The final suite measures disk mode, where the row factory artifact documented in Part 1 no longer masks the true overhead. The corrected numbers are 1.03–1.15x; this is the real cost of query compilation and adapter wrapping, not a regression introduced by the optimization pass.

Detailed breakdowns (disk mode)

Queries:

Type50K100K500K1M
Equality filter (no index)1.13x1.21x1.24x1.14x
Range 50% selectivity1.15x1.15x1.15x1.11x
Complex 5-predicate1.03x1.06x1.04x1.08x
Top-N (1000)1.12x1.11x1.12x1.11x

The unindexed equality filter shows the highest variance (1.13–1.24x across scales); complex queries with more predicates show less overhead because SQLite spends more time evaluating conditions relative to sqler’s fixed per-query cost.

JSON operations:

Type50K100K500K1M
Array contains1.05x1.05x1.05x1.09x
Array isin1.01x1.03x1.02x1.07x
Nested field (depth 3)1.13x1.12x1.12x1.14x

Nested field access at depth 3 carries a consistent ~13% overhead; the flat array operations are closer to parity.

Aggregates:

Type50K100K500K1M
sum1.17x1.15x1.17x1.13x
avg1.19x1.13x1.17x1.13x
min1.08x1.16x1.15x1.09x
max1.11x1.07x1.12x1.06x

Aggregates converge toward parity at scale; at 1M rows, every aggregate is under 1.13x.


Absolute Wall-Clock Cost at 1M Rows

Ratios tell you proportions; absolute numbers tell you whether to care.

OperationsqlersqliteOverhead
FTS rebuild32.3s31.4s+0.9s
Bulk insert7.0s7.9s−0.9s
any_where6.5s6.2s+0.3s
Export CSV10.3s7.7s+2.6s
Complex 5-pred query2.4s2.2s+0.2s
FTS ranked1.04s1.03s+0.01s
Equality filter (no idx)1.2s1.1s+0.1s
Backup614ms610ms+4ms

The largest absolute penalty is CSV export: 2.6 extra seconds on 1M rows for per-field extraction overhead. Everything else is under 1 second of additional cost on a million rows. For queries that return hundreds or thousands of rows (most real workloads), the overhead is single-digit milliseconds.

Bulk insert saves 0.9 seconds at 1M by using chunked multi-row INSERT instead of executemany().


What’s Irreducible

CSV export (1.34x): Per-field extraction from JSON dicts plus _serialize_value() per field. Both arms parse JSON; sqler’s overhead is the dict-to-row conversion that CSV format requires. This could shrink with a C-level JSON-to-CSV converter, but in Python it’s structural.

Query overhead (5–12%): Query compilation, adapter wrapping, and the row factory artifact. At 1M rows with a 5-predicate complex query, that’s 200ms of absolute overhead. For indexed lookups and top-N queries, the absolute overhead is under 10ms; whether that matters depends on whether 10ms matters to your use case.

Aggregate overhead (6–13%): Same sources as query overhead. At 1M rows, the most expensive aggregate (sum) takes 1.13x or about 100ms extra.


The msgspec Read Path

M-5 added SQLerMsgspecModel as an opt-in alternative to SQLerLiteModel. For read-heavy workloads where hydration matters:

PathSpeed vs Lite default
queryset.all() with SQLerLiteModel1.0x (baseline)
queryset.all() with SQLerMsgspecModel1.46x faster
queryset.as_dicts()~2.8x faster

The as_dicts() path returns raw dicts with no hydration, fastest when you don’t need model instances. The msgspec path gives full type safety at 1.46x the speed of dataclasses. Details in Part 3.

Hydration repeatability (4 independent runs)

Run contextPure validateEnd-to-end all() (disk)
Medium suite5.15x1.47x
Large suite4.62x1.43x
Xlarge suite5.01x1.46x
Xxlarge suite4.97x1.43x

The hydration benchmark uses fixed 50K rows regardless of the scale parameter. Running it as part of each scale’s full suite provides four independent measurements; the ratios are highly repeatable.


Known Caveats

  1. Row factory artifact (3–6%): The baseline uses sqlite3.Row with string key access; sqler uses integer index access. This consistently makes sqler appear ~5% faster on reads, which it isn’t. Documented in Part 1.

  2. Single machine: All measurements on one machine (Linux x86_64, 8 cores). Different hardware, OS, or Python versions may shift ratios.

  3. No update/delete benchmarks: The suite measures insert and query extensively but doesn’t benchmark update() or delete() at scale.

  4. No concurrent write contention: Optimistic locking is tested for correctness, not throughput under contention.

  5. 100K CSV anomaly: One 100K run showed 0.32x for CSV export (the sqlite baseline took 2,979ms instead of the expected ~716ms). Environmental outlier; not reproducible at other scales. Excluded from trend analysis.


Running It Yourself

# Install with benchmark dependencies
uv sync --all-groups

# Run at medium scale (~20 minutes, memory + disk)
uv run python -m benchmarks run --scale medium --storage both

# Run at large scale (~40 minutes)
uv run python -m benchmarks run --scale large --storage both

# Generate charts from latest results
uv run --group benchmarks python -m benchmarks plot

# List all scenarios
uv run python -m benchmarks list

The benchmark suite enforces matched PRAGMAs, arm alternation, and GC isolation automatically. Results are saved as JSON in benchmarks/results/.


The Journey

This series started with 22 benchmarks that all told the same story: sqler is fast. That story was wrong, or rather, it was the answer to a question nobody asked. “Is sqler fast with better configuration than the baseline?” is not a useful question.

The adversarial audit found 23 fairness issues across four rounds. The fair rewrite made every number worse. The optimization pass fixed five real bottlenecks, two of which turned out to be benchmark bugs, not code bugs.

The final numbers: bulk insert 0.89x (faster than raw), FTS ranked 1.00x (perfect parity), everything else ≤1.15x, one irreducible gap at 1.34x. Confirmed at 50K, 100K, 500K, and 1M rows across 1,725 measurements and 10.5 hours of runtime.

Whether these numbers are good enough depends on what you’re building. For most workloads (the kind where you’re querying a few thousand rows and exporting results), the overhead is under 10ms in absolute terms. For million-row bulk operations, sqler is at parity or faster than the naive executemany() approach. The ORM layer is not free, but the cost is stable, predictable, and small relative to the SQL work underneath it.

Start from the beginning: Part 1: The Methodology.