Original: Simon Willison · 09/03/2026
Summary
The article discusses new PostgreSQL features that allow developers to replicate production query plans in development environments without needing production data.Key Insights
“Statistics dumps are tiny.” — Radim says about the size of statistics dumps compared to production data.
“You can inject whatever alternative statistics you like.” — D. Richard Hipp’s response regarding SQLite’s ability to control query planner statistics.
“These statistics are pretty small.” — Discussion on the size of statistics in relation to production data.
Topics
Full Article
9th March 2026 - Link Blog Production query plans without production data (via) Radim Marek describes the new pg_restore_relation_stats() and pg_restore_attribute_stats() functions that were introduced in PostgreSQL 18 in September 2025. The PostgreSQL query planner makes use of internal statistics to help it decide how to best execute a query. These statistics often differ between production data and development environments, which means the query plans used in production may not be replicable in development. PostgreSQL’s new features now let you copy those statistics down to your development environment, allowing you to simulate the plans for production workloads without needing to copy in all of that data first. I found this illustrative example useful: SELECT pg_restore_attribute_stats( ‘schemaname’, ‘public’, ‘relname’, ‘test_orders’, ‘attname’, ‘status’, ‘inherited’, false::boolean, ‘null_frac’, 0.0::real, ‘avg_width’, 9::integer, ‘n_distinct’, 5::real, ‘most_common_vals’, ‘{delivered,shipped,cancelled,pending,returned}‘::text, ‘most_common_freqs’, ‘{0.95,0.015,0.015,0.015,0.005}‘::real[] ); This simulates statistics for a status column that is 95% delivered. Based on these statistics PostgreSQL can decide to use an index for status = ‘shipped’ but to instead perform a full table scan for status = ‘delivered’. These statistics are pretty small. Radim says: Statistics dumps are tiny. A database with hundreds of tables and thousands of columns produces a statistics dump under 1MB. The production data might be hundreds of GB. The statistics that describe it fit in a text file. I posted on the SQLite user forum asking if SQLite could offer a similar feature and D. Richard Hipp promptly replied that it has one already: All of the data statistics used by the query planner in SQLite are available in the sqlite_stat1 table (or also in the sqlite_stat4 table if you happen to have compiled with SQLITE_ENABLE_STAT4). That table is writable. You can inject whatever alternative statistics you like. This approach to controlling the query planner is mentioned in the documentation: https://sqlite.org/optoverview.html#manual_control_of_query_plans_using_sqlite_stat_tables. See also https://sqlite.org/lang_analyze.html#fixed_results_of_analyze. The “.fullschema” command in the CLI outputs both the schema and the content of the sqlite_statN tables, exactly for the reasons outlined above - so that we can reproduce query problems for testing without have to load multi-terabyte database files.Related Articles
Structured Context Engineering for File-Native Agentic Systems
Simon Willison · explanation · 38% similar
cysqlite - a new sqlite driver
Simon Willison · explanation · 37% similar
A quote from Ally Piechowski
Simon Willison · how-to · 36% similar
Originally published at https://simonwillison.net/2026/Mar/9/production-query-plans-without-production-data/#atom-everything.