Search code examples
sqlpostgresqloptimizationquery-optimization

Optimizing a one to many query that's using a subselect to paginate


I was hoping to get some expert eyes over my query and see why I'm receiving different performance.

The problem I'm trying to solve is I need orders that can have one to many items. These orders need to be paginated.

To do this I've taken the following approach. I'm using a sub query to filter orders by the required item attributes. I'm then rejoining to the items to get their required fields. This means that when paginating I will not incorrectly filter order rows when there are orders with 2 or more items.

I'm seeing intermittently slow queries. The second time they are run they're much quicker. I presume this is because Postgres is loading indexes and such into memory?

I don't fully understand what is happening From the explain. It's looking like it needs to scan every order to see if they have an item that fits the subquery? I'm a bit confused by the following line. It's saying it needs to scan 286853 rows but also only 165?

Index Scan Backward using orders_created_at_idx on orders  (cost=0.42..2708393.65 rows=286853 width=301) (actual time=64.598..2114.676 rows=165 loops=1)

Is there a way to get Postgres to filter by the items first or am I reading this incorrectly and it is doing that already?

Query:

SELECT 
  "orders"."id_orders" as "orders.id_orders", 
  "items"."id_items" as "items"."id_items",
  ..., 
  orders.created_at, orders.updated_at 
FROM (
  SELECT 
    orders.id_orders,
    orders.created_at,
    orders.updated_at
  FROM orders 
  WHERE orders.status in ('completed','pending') AND 
  (
    SELECT fk_vendor_id FROM items
    WHERE (
      items.fk_order_id = orders.id_orders AND
      items.fk_vendor_id = '0012800001YVccUAAT' AND
      items.fk_offer = '0060I00000RAKFYQA5' AND
      items.status IN ('completed','cancelled')
    ) LIMIT 1
  ) IS NOT NULL ORDER BY orders.created_at DESC LIMIT 50 OFFSET 150
) as orders INNER JOIN items ON items.fk_order_id = orders.id_orders;

1st explain:

Nested Loop  (cost=1417.11..2311.77 rows=67 width=1705) (actual time=2785.221..17025.325 rows=17 loops=1)
  ->  Limit  (cost=1416.68..1888.77 rows=50 width=301) (actual time=2785.216..17024.918 rows=15 loops=1)
        ->  Index Scan Backward using orders_created_at_idx on orders  (cost=0.42..2708393.65 rows=286853 width=301) (actual time=1214.013..17024.897 rows=165 loops=1)
              Filter: ((status = ANY ('{completed,pending}'::orders_status_enum[])) AND ((SubPlan 1) IS NOT NULL))
              Rows Removed by Filter: 313631
              SubPlan 1
                ->  Limit  (cost=0.42..8.45 rows=1 width=19) (actual time=0.047..0.047 rows=0 loops=287719)
                      ->  Index Scan using items_fk_order_id_index on items items_1  (cost=0.42..8.45 rows=1 width=19) (actual time=0.047..0.047 rows=0 loops=287719)
                            Index Cond: (fk_order_id = orders.id_orders)
                            Filter: ((status = ANY ('{completed,cancelled}'::items_status_enum[])) AND (fk_vendor_id = '0012800001YVccUAAT'::text) AND (fk_offer = '0060I00000RAKFYQA5'::text))
                            Rows Removed by Filter: 1
  ->  Index Scan using items_fk_order_id_index on items  (cost=0.42..8.44 rows=1 width=1404) (actual time=0.002..0.026 rows=1 loops=15)
        Index Cond: (fk_order_id = orders.id_orders)
Planning time: 1.791 ms
Execution time: 17025.624 ms
(15 rows)

2nd explain:

Nested Loop  (cost=1417.11..2311.77 rows=67 width=1705) (actual time=115.659..2114.739 rows=17 loops=1)
  ->  Limit  (cost=1416.68..1888.77 rows=50 width=301) (actual time=115.654..2114.691 rows=15 loops=1)
        ->  Index Scan Backward using orders_created_at_idx on orders  (cost=0.42..2708393.65 rows=286853 width=301) (actual time=64.598..2114.676 rows=165 loops=1)
              Filter: ((status = ANY ('{completed,pending}'::orders_status_enum[])) AND ((SubPlan 1) IS NOT NULL))
              Rows Removed by Filter: 313631
              SubPlan 1
                ->  Limit  (cost=0.42..8.45 rows=1 width=19) (actual time=0.006..0.006 rows=0 loops=287719)
                      ->  Index Scan using items_fk_order_id_index on items items_1  (cost=0.42..8.45 rows=1 width=19) (actual time=0.006..0.006 rows=0 loops=287719)
                            Index Cond: (fk_order_id = orders.id_orders)
                            Filter: ((status = ANY ('{completed,cancelled}'::items_status_enum[])) AND (fk_vendor_id = '0012800001YVccUAAT'::text) AND (fk_offer = '0060I00000RAKFYQA5'::text))
                            Rows Removed by Filter: 1
  ->  Index Scan using items_fk_order_id_index on items  (cost=0.42..8.44 rows=1 width=1404) (actual time=0.002..0.002 rows=1 loops=15)
        Index Cond: (fk_order_id = orders.id_orders)
Planning time: 2.011 ms
Execution time: 2115.052 ms
(15 rows)

Order indexes:

"cart_pkey" PRIMARY KEY, btree (id_orders)
"orders_legacy_id_uindex" UNIQUE, btree (legacy_id_orders)
"orders_transaction_key_uindex" UNIQUE, btree (transaction_key)
"orders_created_at_idx" btree (created_at)
"orders_customer_email_idx" gin (customer_email gin_trgm_ops)
"orders_customer_full_name_idx" gin (customer_full_name gin_trgm_ops)
Referenced by:
TABLE "items" CONSTRAINT "items_fk_order_id_fkey" FOREIGN KEY (fk_order_id) REFERENCES orders(id_orders) ON DELETE RESTRICT
TABLE "items_log" CONSTRAINT "items_log_fk_order_id_fkey" FOREIGN KEY (fk_order_id) REFERENCES orders(id_orders)

Items indexes:

"items_pkey" PRIMARY KEY, btree (id_items)
"items_fk_vendor_id_booking_number_unique" UNIQUE, btree (fk_vendor_id, booking_number) WHERE legacy_id_items IS NULL
"items_legacy_id_uindex" UNIQUE, btree (legacy_id_items)
"items_transaction_key_uindex" UNIQUE, btree (transaction_key)
"items_booking_number_index" btree (booking_number)
"items_fk_order_id_index" btree (fk_order_id)
"items_fk_vendor_id_index" btree (fk_vendor_id)
"items_status_index" btree (status)

Foreign-key constraints:
"items_fk_order_id_fkey" FOREIGN KEY (fk_order_id) REFERENCES orders(id_orders) ON DELETE RESTRICT

Solution

  • The difference in execution times is probably really the effect of caching. You could use EXPLAIN (ANALYZE, BUFFERS) to see how many pages are found in the database cache.

    To make your query more readable, you should rewrite

    WHERE (
       SELECT fk_vendor_id FROM items
       WHERE (
         items.fk_order_id = orders.id_orders AND
         items.fk_vendor_id = '0012800001YVccUAAT' AND
         items.fk_offer = '0060I00000RAKFYQA5' AND
         items.status IN ('completed','cancelled')
       ) LIMIT 1
    ) IS NOT NULL
    

    to

    WHERE NOT EXISTS
       (SELECT 1 FROM items
        WHERE items.fk_order_id = orders.id_orders
          AND items.fk_vendor_id = '0012800001YVccUAAT'
          AND items.fk_offer = '0060I00000RAKFYQA5'
          AND items.status IN ('completed','cancelled')
       )
    

    The best thing you could do to speed up the query is to create an index:

    CREATE INDEX ON items(fk_order_id, fk_vendor_id, fk_offer);