Search code examples
sqlpostgresqlperformanceoptimization

Slow SQL query when almost no data


I have a fairly complicated SQL query that I'm running against Postgres 14. It runs fine in production with a few thousands rows and takes about 200 ms.

The same query on my development server with less than 10 rows takes 7 seconds!

Production hardware is nothing special - an AWS T3 medium. In development, it's a Virtualbox VM running on a Intel 9900k.

Other queries are roughly equivalent in performance between dev & prod.

The SQL query joins a few tables and views like this:

    SELECT product.*, track_view.*, release_view.* user.*
    FROM product
    LEFT JOIN track_view ON product.id = track_view.digital_product_id
    LEFT JOIN user ON user.id = COALESCE(track_view.user_id, release_view.seller_id);

The track_view and release_view have additional joins to other tables and do some json aggregation.

This is the bottom end of the explain analyse outputs run against each database.

Production explain Analyse

Development explain analyse

I'm not familiar that familiar with explain output but it looks like 5s was optimisation and 2.5s was emission.

How should I approach improving it?

Thanks!


Solution

  • From the docs:

    https://www.postgresql.org/docs/current/populate.html#POPULATE-ANALYZE

    14.4.8. Run ANALYZE Afterwards

    Whenever you have significantly altered the distribution of data within a table, running ANALYZE is strongly recommended. This includes bulk loading large amounts of data into the table. Running ANALYZE (or VACUUM ANALYZE) ensures that the planner has up-to-date statistics about the table. With no statistics or obsolete statistics, the planner might make poor decisions during query planning, leading to poor performance on any tables with inaccurate or nonexistent statistics. Note that if the autovacuum daemon is enabled, it might run ANALYZE automatically; see Section 25.1.3 and Section 25.1.6 for more information.

    For information on where those statistics are stored and for what purpose see:

    https://www.postgresql.org/docs/current/catalog-pg-statistic.html

    For more detail on the planner and how it works with the statistics see:

    https://www.postgresql.org/docs/current/performance-tips.html