Search code examples
sqlpostgresqlpostgresql-performance

Adding OR clause to where condition is much slower than two individual queries


There are 2 tables, main_table and other_table. A user is allowed access to main_table items if his user_id matches the main_table.user_id column or if there is an entry in other_table where his user_id is in the read_acl column (array using gin index). Both tables have about 2 million rows.

The query is pretty slow especially, it seems Postgresql is doing an index scan on all entries in the main_table. If I remove the or clause and rewrite it to 2 queries then the speed is much better. Can this be solved somehow?

Current query is this:

select *
    from main_table
    where user_id = 123 or exists (select * from other_table f 
                                   where main_table.main_table_id = f.main_table_id 
                                   and '{123}' && read_acl)
    order by main_table_id limit 10;
                                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..170.97 rows=10 width=2590) (actual time=172.734..389.819 rows=10 loops=1)
   ->  Index Scan using main_table_pk on main_table  (cost=0.43..14158795.44 rows=830198 width=2590) (actual time=172.733..389.811 rows=10 loops=1)
         Filter: ((user_id = 123) OR (alternatives: SubPlan 1 or hashed SubPlan 2))
         Rows Removed by Filter: 776709
         SubPlan 1
           ->  Index Scan using other_table_main_table_id_idx on other_table f  (cost=0.43..8.45 rows=1 width=0) (never executed)
                 Index Cond: (main_table.main_table_id = main_table_id)
                 Filter: ('{123}'::text[] && read_acl)
         SubPlan 2
           ->  Bitmap Heap Scan on other_table f_1  (cost=1678.58..2992.04 rows=333 width=8) (actual time=9.413..9.432 rows=12 loops=1)
                 Recheck Cond: ('{123}'::text[] && read_acl)
                 Heap Blocks: exact=12
                 ->  Bitmap Index Scan on other_table_read_acl  (cost=0.00..1678.50 rows=333 width=0) (actual time=9.401..9.401 rows=12 loops=1)
                       Index Cond: ('{123}'::text[] && read_acl)
 Planning Time: 0.395 ms
 Execution Time: 389.877 ms
(16 rows)

Rewritten query 1 first part of OR clause:

    select *
    from main_table
    where user_id = 123
    order by main_table_id limit 10;

                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=482.27..482.29 rows=10 width=2590) (actual time=0.039..0.040 rows=8 loops=1)
   ->  Sort  (cost=482.27..482.58 rows=126 width=2590) (actual time=0.038..0.039 rows=8 loops=1)
         Sort Key: main_table_id
         Sort Method: quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on main_table  (cost=5.40..479.54 rows=126 width=2590) (actual time=0.020..0.031 rows=8 loops=1)
               Recheck Cond: (user_id = 123)
               Heap Blocks: exact=8
               ->  Bitmap Index Scan on test500  (cost=0.00..5.37 rows=126 width=0) (actual time=0.015..0.015 rows=8 loops=1)
                     Index Cond: (user_id = 123)
 Planning Time: 0.130 ms
 Execution Time: 0.066 ms
(11 rows)

Rewritten query 2:

    select *
    from main_table
    where exists (select * from other_table f 
                  where main_table.main_table_id = f.main_table_id 
                  and '{123}' && read_acl)
    order by main_table_id limit 10;
                                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5771.59..5771.62 rows=10 width=2590) (actual time=8.083..8.086 rows=10 loops=1)
   ->  Sort  (cost=5771.59..5772.42 rows=333 width=2590) (actual time=8.082..8.083 rows=10 loops=1)
         Sort Key: main_table.main_table_id
         Sort Method: quicksort  Memory: 26kB
         ->  Nested Loop  (cost=2985.30..5764.40 rows=333 width=2590) (actual time=8.018..8.072 rows=12 loops=1)
               ->  HashAggregate  (cost=2984.88..2988.21 rows=333 width=8) (actual time=7.999..8.004 rows=12 loops=1)
                     Group Key: f.main_table_id
                     ->  Bitmap Heap Scan on other_table f  (cost=1670.58..2984.04 rows=333 width=8) (actual time=7.969..7.990 rows=12 loops=1)
                           Recheck Cond: ('{123}'::text[] && read_acl)
                           Heap Blocks: exact=12
                           ->  Bitmap Index Scan on other_table_read_acl  (cost=0.00..1670.50 rows=333 width=0) (actual time=7.957..7.958 rows=12 loops=1)
                                 Index Cond: ('{123}'::text[] && read_acl)
               ->  Index Scan using main_table_pk on main_table  (cost=0.43..8.34 rows=1 width=2590) (actual time=0.005..0.005 rows=1 loops=12)
                     Index Cond: (main_table_id = f.main_table_id)
 Planning Time: 0.431 ms
 Execution Time: 8.137 ms
(16 rows)

Solution

  • That is as expected. OR in a WHERE condition often is a problem for the optimizer. Yes, you can usually rewrite the query to a UNION or UNION ALL of the partial queries you show in your question, but this is not something that the optimizer can do automatically.

    Consider this example:

    CREATE TABLE a (
       id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
       x integer,
       p integer
    );
    
    CREATE TABLE b (
       id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
       x integer,
       q integer
    );
    
    INSERT INTO a (x, p) VALUES
       (1, 1),
       (1, 1),
       (2, 1);
    
    INSERT INTO b (x, q) VALUES
       (1, 3),
       (2, 3);
    

    Now the query with OR gives you:

    SELECT x, a.p, b.q
    FROM a JOIN b USING (x)
    WHERE a.p = 1 OR b.q = 3;
    
     x │ p │ q 
    ═══╪═══╪═══
     1 │ 1 │ 3
     1 │ 1 │ 3
     2 │ 1 │ 3
    (3 rows)
    

    Rewriting the query with UNION or UNION ALL produces a different result:

    SELECT x, a.p, b.q
    FROM a JOIN b USING (x)
    WHERE a.p = 1
    UNION
    SELECT x, a.p, b.q
    FROM a JOIN b USING (x)
    WHERE b.q = 3;
    
     x │ p │ q 
    ═══╪═══╪═══
     2 │ 1 │ 3
     1 │ 1 │ 3
    (2 rows)
    
    SELECT x, a.p, b.q
    FROM a JOIN b USING (x)
    WHERE a.p = 1
    UNION ALL
    SELECT x, a.p, b.q
    FROM a JOIN b USING (x)
    WHERE b.q = 3;
    
     x │ p │ q 
    ═══╪═══╪═══
     1 │ 1 │ 3
     1 │ 1 │ 3
     2 │ 1 │ 3
     1 │ 1 │ 3
     1 │ 1 │ 3
     2 │ 1 │ 3
    (6 rows)
    

    The case in your question uses SELECT *, which will include the primary key, so in this case it would be safe to use UNION. But that is reasoning that goes beyond the capabilities of the optimizer.