Search code examples
postgresqljoinindexingquery-performanceexplain

Speed up Postgresql query with multiple joins


Please help me to optimize the following query:

EXPLAIN ANALYZE
SELECT 
"subscriptions"."id" AS t0_r0,
"subscriptions"."created_at" AS t0_r3, 
"subscriptions"."updated_at" AS t0_r4, 
"subscriptions"."next_date" AS t0_r5, 
"subscriptions"."number_of_games" AS t0_r6, 
"subscriptions"."renewal_date" AS t0_r7, 
"subscriptions"."type" AS t0_r8, 
"subscriptions"."order_id" AS t0_r9,
"orders"."id" AS t1_r0, 
"orders"."customer_id" AS t1_r1, 
"orders"."created_at" AS t1_r2, 
"orders"."updated_at" AS t1_r3, 
"orders"."payment_id" AS t1_r4, 
"orders"."status" AS t1_r5,
"orders"."col13" AS t1_r13, 
"orders"."col14" AS t1_r14, 
"orders"."col15" AS t1_r15,
"orders"."active_subscription_id" AS t1_r21, 
"orders"."product_id" AS t1_r22 
FROM 
"subscriptions" 
INNER JOIN "orders" ON "orders"."id" = "subscriptions"."order_id" 
WHERE 
"subscriptions"."type" IN ('Const1') 
AND "orders"."status" = 'confirm' 
AND "orders"."product_id" IN (1, 95, 79, 22) 
AND ("subscriptions"."renewal_date" BETWEEN '2017-09-23' AND '2017-09-29') AND (orders.active_subscription_id = subscriptions.id) 
AND ("subscriptions"."number_of_games" >= 5) 
AND ("subscriptions"."id" NOT IN (
SELECT subscriptions.id 
FROM "subscriptions" 
INNER JOIN "orders" ON "orders"."id" = "subscriptions"."order_id" 
INNER JOIN "table1" ON "table1"."order_id" = "orders"."id" 
WHERE "subscriptions"."type" IN ('Const1') 
AND "orders"."status" = 'confirm' 
AND "orders"."product_id" IN (1, 95, 79, 22) 
AND "table1"."col1" IN ('1041', '1042') 
AND ("subscriptions"."renewal_date" BETWEEN '2017-09-23' AND '2017-09-29') 
AND (orders.active_subscription_id = subscriptions.id) 
AND ("subscriptions"."number_of_games" >= 5))
    ) ;

Initially there're b-tree indexes on:

CREATE INDEX index_table1_on_order_id ON table1 USING btree (order_id);
CREATE INDEX index_orders_on_active_subscription_id ON orders USING btree (active_subscription_id);
CREATE INDEX index_orders_on_status ON orders USING btree (status);
CREATE INDEX orders_payment_id_idx ON orders USING btree (payment_id);
CREATE INDEX index_subscriptions_on_order_id ON subscriptions USING btree (order_id);

All columns with name "id" are primary key. Execution plan:

    Nested Loop  (cost=18699.70..38236.80 rows=1 width=466) (actual time=11185.634..11336.548 rows=3352 loops=1)
   ->  Seq Scan on subscriptions  (cost=18699.28..37754.22 rows=57 width=76) (actual time=11185.610..11309.520 rows=3356 loops=1)
         Filter: ((renewal_date >= '2017-09-23'::date) AND (renewal_date <= '2017-09-29'::date) AND (number_of_games >= 5) AND (NOT (hashed SubPlan 1)) AND ((type)::text = 'Const1'::text))
         Rows Removed by Filter: 522626
         SubPlan 1
           ->  Nested Loop  (cost=0.85..18699.28 rows=1 width=4) (actual time=6743.644..11185.269 rows=31 loops=1)
                 ->  Nested Loop  (cost=0.42..18697.21 rows=1 width=12) (actual time=0.150..1792.440 rows=3383 loops=1)
                       ->  Seq Scan on subscriptions subscriptions_1  (cost=0.00..17740.06 rows=114 width=8) (actual time=0.114..145.256 rows=3387 loops=1)
                             Filter: ((renewal_date >= '2017-09-23'::date) AND (renewal_date <= '2017-09-29'::date) AND (number_of_games >= 5) AND ((type)::text = 'Const1'::text))
                             Rows Removed by Filter: 522595
                       ->  Index Scan using index_orders_on_active_subscription_id on orders orders_1  (cost=0.42..8.39 rows=1 width=8) (actual time=0.471..0.484 rows=1 loops=3387)
                             Index Cond: (active_subscription_id = subscriptions_1.id)
                             Filter: (((status)::text = 'confirm'::text) AND (subscriptions_1.order_id = id) AND (product_id = ANY ('{1,95,79,22}'::integer[])))
                             Rows Removed by Filter: 0
                 ->  Index Scan using index_table1_on_order_id on table1  (cost=0.43..2.05 rows=1 width=4) (actual time=2.775..2.775 rows=0 loops=3383)
                       Index Cond: (order_id = orders_1.id)
                       Filter: ((col1)::text = ANY ('{1041,1042}'::text[]))
                       Rows Removed by Filter: 5
   ->  Index Scan using index_orders_on_active_subscription_id on orders  (cost=0.42..8.46 rows=1 width=390) (actual time=0.007..0.007 rows=1 loops=3356)
         Index Cond: (active_subscription_id = subscriptions.id)
         Filter: (((status)::text = 'confirm'::text) AND (subscriptions.order_id = id) AND (product_id = ANY ('{1,95,79,22}'::integer[])))
         Rows Removed by Filter: 0
 Planning time: 3.928 ms
 Execution time: 11337.023 ms

Creating the following index:

CREATE INDEX index_subscriptions_on_renewal_date ON subscriptions USING btree (renewal_date);

doesn't make things much better. Even rewriting the query doesn't improve the performance either:

EXPLAIN ANALYZE
With subscriptions_1 as (
SELECT 
"subscriptions"."id" AS t0_r0, 
"subscriptions"."created_at" AS t0_r3, 
"subscriptions"."updated_at" AS t0_r4, 
"subscriptions"."next_date" AS t0_r5, 
"subscriptions"."number_of_games" AS t0_r6, 
"subscriptions"."renewal_date" AS t0_r7, 
"subscriptions"."type" AS t0_r8, 
"subscriptions"."order_id" AS t0_r9
FROM 
"subscriptions"
WHERE
"subscriptions"."type" IN ('Const1')
AND ("subscriptions"."renewal_date" >= '2017-09-23' AND "subscriptions"."renewal_date" <= '2017-09-29')
AND ("subscriptions"."number_of_games" >= 5)
ORDER BY "subscriptions"."id"
)
SELECT
Subscriptions_1.*,
"orders"."id" AS t1_r0, 
"orders"."customer_id" AS t1_r1, 
"orders"."created_at" AS t1_r2, 
"orders"."updated_at" AS t1_r3, 
"orders"."payment_id" AS t1_r4, 
"orders"."status" AS t1_r5, 
"orders"."col13" AS t1_r13, 
"orders"."col14" AS t1_r14, 
"orders"."col15" AS t1_r15, 
"orders"."active_subscription_id" AS t1_r21, 
"orders"."product_id" AS t1_r22
FROM
Subscriptions_1
INNER JOIN "orders" ON "orders"."id" = subscriptions_1.t0_r9 
WHERE
"orders"."status" = 'confirm' 
AND "orders"."product_id" IN (1,95,79,22)
AND (orders.active_subscription_id = subscriptions_1.t0_r0)
AND (subscriptions_1.t0_r0 NOT IN (SELECT subscriptions_1.t0_r0 FROM subscriptions_1 INNER JOIN "orders" ON "orders"."id" = subscriptions_1.t0_r9 INNER JOIN "table1" ON "table1"."order_id" = "orders"."id" WHERE "orders"."status" = 'confirm' AND "orders"."product_id" IN (1,95,79,22) AND "table1"."col1" IN ('1041', '1042') AND (orders.active_subscription_id = subscriptions_1.t0_r0))
) ;

Solution

  • The plan is so bad because PostgreSQL underestimates the number of result rows (1 instead of the actual 3383 in the join between subscriptions and orders).

    That causes PostgreSQL to pick a nested loop join for the join with table1, which is where 9 of your 11 seconds are spent.

    There are several approaches:

    1. Run ANALYZE, perhaps with increased default_statistics_target, on all tables affected. Perhaps fresh statistics will lead to a better estimate.

    2. If that doesn't help, create an index ON table1(order_id, col1::text), which will speed up the nested loop join as much as possible.

    3. The brutal way: set enable_nestloop to off for this one query.