Search code examples
sqlruby-on-railspostgresqlherokuheroku-postgres

Huge divergence in query estimate vs actual time in postgres


Between two different environments with identical databases (local machine, and production on heroku) we are seeing a large difference in execution time for the same, fairly simple, query.

The query is:

SELECT "property_tax_bills".* FROM "property_tax_bills" INNER JOIN "property_tax_bill_parsed_addresses" ON "property_tax_bills"."id" = "property_tax_bill_parsed_addresses"."property_tax_bill_id" WHERE "property_tax_bill_parsed_addresses"."parsed_address_id" = 2 AND "property_tax_bills"."statement_date" = '2021-11-20';

property_tax_bills is a large-ish table (44 million records), and the join table property_tax_bill_parsed_addresses is equally large.

The EXPLAIN ANALYZE command run via psql locally returns the following:

 Gather  (cost=1105.30..64845.64 rows=219 width=147) (actual time=15.054..18.941 rows=101 loops=1)
   Output: property_tax_bills.id, property_tax_bills.owner_name, property_tax_bills.property_address, property_tax_bills.mailing_address, property_tax_bills.statement_date, property_tax_bills.pluto_record_id, property_tax_bills.created_at, property_tax_bills.updated_at, property_tax_bills.bbl
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=25539
   ->  Nested Loop  (cost=105.30..63823.74 rows=91 width=147) (actual time=12.147..12.291 rows=34 loops=3)
         Output: property_tax_bills.id, property_tax_bills.owner_name, property_tax_bills.property_address, property_tax_bills.mailing_address, property_tax_bills.statement_date, property_tax_bills.pluto_record_id, property_tax_bills.created_at, property_tax_bills.updated_at, property_tax_bills.bbl
         Inner Unique: true
         Buffers: shared hit=25539
         Worker 0:  actual time=10.752..10.898 rows=36 loops=1
           Buffers: shared hit=7647
         Worker 1:  actual time=11.014..11.162 rows=28 loops=1
           Buffers: shared hit=6483
         ->  Parallel Bitmap Heap Scan on public.property_tax_bill_parsed_addresses  (cost=104.73..32879.67 rows=3672 width=8) (actual time=0.634..3.758 rows=1442 loops=3)
               Output: property_tax_bill_parsed_addresses.id, property_tax_bill_parsed_addresses.property_tax_bill_id, property_tax_bill_parsed_addresses.parsed_address_id, property_tax_bill_parsed_addresses.raw_address, property_tax_bill_parsed_addresses.processed_address, property_tax_bill_parsed_addresses.created_at, property_tax_bill_parsed_addresses.updated_at, property_tax_bill_parsed_addresses.is_verified, property_tax_bill_parsed_addresses.error_message
               Recheck Cond: (property_tax_bill_parsed_addresses.parsed_address_id = 2)
               Heap Blocks: exact=1745
               Buffers: shared hit=3912
               Worker 0:  actual time=0.041..2.924 rows=1295 loops=1
                 Buffers: shared hit=1171
               Worker 1:  actual time=0.045..3.356 rows=1099 loops=1
                 Buffers: shared hit=987
               ->  Bitmap Index Scan on part_i_ptbpa_o_p_a_id_where_not_null  (cost=0.00..102.53 rows=8812 width=0) (actual time=1.049..1.049 rows=4325 loops=1)
                     Index Cond: (property_tax_bill_parsed_addresses.parsed_address_id = 2)
                     Buffers: shared hit=9
         ->  Index Scan using property_tax_bills_pkey on public.property_tax_bills  (cost=0.56..8.43 rows=1 width=147) (actual time=0.006..0.006 rows=0 loops=4325)
               Output: property_tax_bills.id, property_tax_bills.owner_name, property_tax_bills.property_address, property_tax_bills.mailing_address, property_tax_bills.statement_date, property_tax_bills.pluto_record_id, property_tax_bills.created_at, property_tax_bills.updated_at, property_tax_bills.bbl
               Index Cond: (property_tax_bills.id = property_tax_bill_parsed_addresses.property_tax_bill_id)
               Filter: (property_tax_bills.statement_date = '2021-11-20'::date)
               Rows Removed by Filter: 1
               Buffers: shared hit=21627
               Worker 0:  actual time=0.006..0.006 rows=0 loops=1295
                 Buffers: shared hit=6476
               Worker 1:  actual time=0.007..0.007 rows=0 loops=1099
                 Buffers: shared hit=5496
 Planning:
   Buffers: shared hit=498
 Planning Time: 1.750 ms
 Execution Time: 19.000 ms

While on Production, the same command run via heroku pg:psql returns this:

 Nested Loop  (cost=0.20..7.50 rows=1 width=147) (actual time=15.895..2152.165 rows=101 loops=1)
   Output: property_tax_bills.id, property_tax_bills.owner_name, property_tax_bills.property_address, property_tax_bills.mailing_address, property_tax_bills.statement_date, property_tax_bills.pluto_record_id, property_tax_bills.created_at, property_tax_bills.updated_at, property_tax_bills.bbl
   Inner Unique: true
   Buffers: shared hit=5581088
   ->  Index Scan using index_property_tax_bills_on_statement_date on public.property_tax_bills  (cost=0.09..3.38 rows=1 width=147) (actual time=0.034..208.051 rows=1110860 loops=1)
         Output: property_tax_bills.id, property_tax_bills.owner_name, property_tax_bills.property_address, property_tax_bills.mailing_address, property_tax_bills.statement_date, property_tax_bills.pluto_record_id, property_tax_bills.created_at, property_tax_bills.updated_at, property_tax_bills.bbl
         Index Cond: (property_tax_bills.statement_date = '2021-11-20'::date)
         Buffers: shared hit=26788
   ->  Index Scan using i_pr_tax_bill_p_a_o_p_r_b_id on public.property_tax_bill_parsed_addresses  (cost=0.11..4.12 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=1110860)
         Output: property_tax_bill_parsed_addresses.id, property_tax_bill_parsed_addresses.property_tax_bill_id, property_tax_bill_parsed_addresses.parsed_address_id, property_tax_bill_parsed_addresses.raw_address, property_tax_bill_parsed_addresses.processed_address, property_tax_bill_parsed_addresses.created_at, property_tax_bill_parsed_addresses.updated_at, property_tax_bill_parsed_addresses.is_verified, property_tax_bill_parsed_addresses.error_message
         Index Cond: (property_tax_bill_parsed_addresses.property_tax_bill_id = property_tax_bills.id)
         Filter: (property_tax_bill_parsed_addresses.parsed_address_id = 2)
         Rows Removed by Filter: 1
         Buffers: shared hit=5554300
 Planning Time: 0.225 ms
 Execution Time: 2152.224 ms

As you can see, the query plan locally is far more complex and estimates out to be much longer. However on prod, the query plan is straightforward and intuitive, estimates quickly, but somehow actually executes over 2 orders of magnitude slower than locally. The app is running live, but with very little traffic (4-5 users, max 15 requests/minute).

Some additional information on specs of each machine:

Local has 64GB of RAM and 12 cores running on a NVME roughly capable of 5 GB/s of read/write - Postgres version 13.4

Production is a standard-9 764GB of RAM postgres instance on Heroku. - Postgres client version 13.4


Solution

  • This is your problem:

    Index Scan using i_pr_tax_bill_p_a_o_p_r_b_id on public.property_tax_bill_parsed_addresses (cost=0.11..4.12 rows=1 width=8) (actual time=0.002..0.002 rows=0 loops=1110860) ... Index Cond: (property_tax_bill_parsed_addresses.property_tax_bill_id = property_tax_bills.id) Filter: (property_tax_bill_parsed_addresses.parsed_address_id = 2)

    Rows Removed by Filter: 1

    It's doing 1110860 index scans and after successfully finding the data, removing most of it.

    Add the parsed_address_id to this index, to avoid the filtering afterwards.

    CREATE INDEX idx_name_of_your_index ON property_tax_bill_parsed_addresses (property_tax_bill_id,parsed_address_id);
    

    Does the query plan change when you have this index in place?