Search code examples
sqldatabasepostgresqlquery-optimization

Slow queries in specific PostgreSQL client/version


I have a query that returns 0 rows, but ends up - in one specific machine - taking around 10 seconds to complete. In every other machine i tested with the same database (although different postgres versions) and the query is almost instant.

Also, the results i get by doing "explain (analyze)" on the query are sort of the same in the "fast" machines, but a bit different in the problematic one, it seems that it never does a parallel seq scan, and all the other machines do, even though the parallel worker configurations are all the same in every machine. Results (all same database, PC#1 is the problematic one):


PC#1, PostgreSQL 11.5

Seq Scan on cc_parc  (cost=0.00..1175721.63 rows=58 width=853) (actual time=9537.152..9537.152 rows=0 loops=1)
  Filter: ((class = 'db1.C_Parcela'::text) AND (fact = 221327))
  Rows Removed by Filter: 217422
  Buffers: shared hit=8034 read=1163008
Planning Time: 0.448 ms
Execution Time: 9537.602 ms


PC#2, PostgreSQL 12.1

Gather  (cost=1000.00..11684.49 rows=6 width=377) (actual time=136.784..139.779 rows=0 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=344 read=8981
  ->  Parallel Seq Scan on cc_parc  (cost=0.00..10683.89 rows=2 width=377) (actual time=30.167..30.167 rows=0 loops=3)
        Filter: ((class = 'db1.C_Parcela'::text) AND (fact = 221327))
        Rows Removed by Filter: 72474
        Buffers: shared hit=344 read=8981
Planning Time: 1.945 ms
Execution Time: 139.860 ms


PC#3, PostgreSQL 13.2

Gather  (cost=1000.00..11684.39 rows=5 width=1495) (actual time=70.610..88.546 rows=0 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=212 read=9113
  ->  Parallel Seq Scan on cc_parc  (cost=0.00..10683.89 rows=2 width=1495) (actual time=43.353..43.353 rows=0 loops=3)         Filter: ((class = 'db1.C_Parcela'::text) AND (fact = 221327))
        Rows Removed by Filter: 72474
        Buffers: shared hit=212 read=9113
Planning:
  Buffers: shared hit=311 read=6
Planning Time: 1.047 ms
Execution Time: 88.604 ms


PC#4, PostgreSQL 10.5

Gather  (cost=1000.00..11685.63 rows=6 width=377) (actual time=98.253..121.924 rows=0 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=2 read=9324
  ->  Parallel Seq Scan on cc_parc  (cost=0.00..10685.03 rows=2 width=377) (actual time=53.534..53.535 rows=0 loops=3)
        Filter: ((class = 'db1.C_Parcela'::text) AND (fact = 221327))
        Rows Removed by Filter: 72474
        Buffers: shared hit=2 read=9324
Planning time: 7.315 ms
Execution time: 122.090 ms

Also the number of filtered rows are different, the specific query ( select * from cc_parc where class='db1.C_Parcela' and fact=221327; ) returns 0 rows, its count value is 217422, but in the different machines the number of filteres rows is different,

PC#1 - Rows Removed by Filter: 217422

All other PCs - Rows Removed by Filter: 72474

I have also tried running all of the same tests in the same machine but with PostgreSQL 12 and the situation is solved, the values are the expected ones again:

PC#1, PostgreSQL 12.2

Gather  (cost=1000.00..11684.59 rows=7 width=377) (actual time=94.210..99.924 rows=0 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  Buffers: shared hit=370 read=8955
  ->  Parallel Seq Scan on cc_parc  (cost=0.00..10683.89 rows=3 width=377) (actual time=51.816..51.816 rows=0 loops=3)
        Filter: ((class = 'db1.C_Parcela'::text) AND (fact = 221327))
        Rows Removed by Filter: 72474
        Buffers: shared hit=370 read=8955
Planning Time: 4.815 ms
Execution Time: 100.059 ms

Am i safe to assume there was some sort of bug in a PostgreSQL 11 version that caused this, or does anyone have any other idea?


Solution

  • I noticed that after doing a dump and restore in the same psql version the issue resolved itself.

    After analysing the output of select * from pgstattuple('public.cc_parc'); and comparing the result in a "bugged" database VS a correctly functioning one, I noticed that the table_length in the bugged db was 125x bigger... which is certainly not supposed to be the fact.

    After running a VACUUM FULL on the table, table_length was restored to its correct value and the slowness issue completely solved.

    Running a full vacuum does not seem to be advisable, so I will try to find what bloated the database like this. Apparently most of the suggestions in here seem to be appropriate