There are 2 tables, main_table
and other_table
. A user is allowed access to main_table
items if his user_id
matches the main_table.user_id
column or if there is an entry in other_table
where his user_id
is in the read_acl
column (array using gin index). Both tables have about 2 million rows.
The query is pretty slow especially, it seems Postgresql is doing an index scan on all entries in the main_table
. If I remove the or
clause and rewrite it to 2 queries then the speed is much better. Can this be solved somehow?
Current query is this:
select *
from main_table
where user_id = 123 or exists (select * from other_table f
where main_table.main_table_id = f.main_table_id
and '{123}' && read_acl)
order by main_table_id limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..170.97 rows=10 width=2590) (actual time=172.734..389.819 rows=10 loops=1)
-> Index Scan using main_table_pk on main_table (cost=0.43..14158795.44 rows=830198 width=2590) (actual time=172.733..389.811 rows=10 loops=1)
Filter: ((user_id = 123) OR (alternatives: SubPlan 1 or hashed SubPlan 2))
Rows Removed by Filter: 776709
SubPlan 1
-> Index Scan using other_table_main_table_id_idx on other_table f (cost=0.43..8.45 rows=1 width=0) (never executed)
Index Cond: (main_table.main_table_id = main_table_id)
Filter: ('{123}'::text[] && read_acl)
SubPlan 2
-> Bitmap Heap Scan on other_table f_1 (cost=1678.58..2992.04 rows=333 width=8) (actual time=9.413..9.432 rows=12 loops=1)
Recheck Cond: ('{123}'::text[] && read_acl)
Heap Blocks: exact=12
-> Bitmap Index Scan on other_table_read_acl (cost=0.00..1678.50 rows=333 width=0) (actual time=9.401..9.401 rows=12 loops=1)
Index Cond: ('{123}'::text[] && read_acl)
Planning Time: 0.395 ms
Execution Time: 389.877 ms
(16 rows)
Rewritten query 1 first part of OR clause:
select *
from main_table
where user_id = 123
order by main_table_id limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=482.27..482.29 rows=10 width=2590) (actual time=0.039..0.040 rows=8 loops=1)
-> Sort (cost=482.27..482.58 rows=126 width=2590) (actual time=0.038..0.039 rows=8 loops=1)
Sort Key: main_table_id
Sort Method: quicksort Memory: 25kB
-> Bitmap Heap Scan on main_table (cost=5.40..479.54 rows=126 width=2590) (actual time=0.020..0.031 rows=8 loops=1)
Recheck Cond: (user_id = 123)
Heap Blocks: exact=8
-> Bitmap Index Scan on test500 (cost=0.00..5.37 rows=126 width=0) (actual time=0.015..0.015 rows=8 loops=1)
Index Cond: (user_id = 123)
Planning Time: 0.130 ms
Execution Time: 0.066 ms
(11 rows)
Rewritten query 2:
select *
from main_table
where exists (select * from other_table f
where main_table.main_table_id = f.main_table_id
and '{123}' && read_acl)
order by main_table_id limit 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=5771.59..5771.62 rows=10 width=2590) (actual time=8.083..8.086 rows=10 loops=1)
-> Sort (cost=5771.59..5772.42 rows=333 width=2590) (actual time=8.082..8.083 rows=10 loops=1)
Sort Key: main_table.main_table_id
Sort Method: quicksort Memory: 26kB
-> Nested Loop (cost=2985.30..5764.40 rows=333 width=2590) (actual time=8.018..8.072 rows=12 loops=1)
-> HashAggregate (cost=2984.88..2988.21 rows=333 width=8) (actual time=7.999..8.004 rows=12 loops=1)
Group Key: f.main_table_id
-> Bitmap Heap Scan on other_table f (cost=1670.58..2984.04 rows=333 width=8) (actual time=7.969..7.990 rows=12 loops=1)
Recheck Cond: ('{123}'::text[] && read_acl)
Heap Blocks: exact=12
-> Bitmap Index Scan on other_table_read_acl (cost=0.00..1670.50 rows=333 width=0) (actual time=7.957..7.958 rows=12 loops=1)
Index Cond: ('{123}'::text[] && read_acl)
-> Index Scan using main_table_pk on main_table (cost=0.43..8.34 rows=1 width=2590) (actual time=0.005..0.005 rows=1 loops=12)
Index Cond: (main_table_id = f.main_table_id)
Planning Time: 0.431 ms
Execution Time: 8.137 ms
(16 rows)
That is as expected. OR
in a WHERE
condition often is a problem for the optimizer. Yes, you can usually rewrite the query to a UNION
or UNION ALL
of the partial queries you show in your question, but this is not something that the optimizer can do automatically.
Consider this example:
CREATE TABLE a (
id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
x integer,
p integer
);
CREATE TABLE b (
id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
x integer,
q integer
);
INSERT INTO a (x, p) VALUES
(1, 1),
(1, 1),
(2, 1);
INSERT INTO b (x, q) VALUES
(1, 3),
(2, 3);
Now the query with OR
gives you:
SELECT x, a.p, b.q
FROM a JOIN b USING (x)
WHERE a.p = 1 OR b.q = 3;
x │ p │ q
═══╪═══╪═══
1 │ 1 │ 3
1 │ 1 │ 3
2 │ 1 │ 3
(3 rows)
Rewriting the query with UNION
or UNION ALL
produces a different result:
SELECT x, a.p, b.q
FROM a JOIN b USING (x)
WHERE a.p = 1
UNION
SELECT x, a.p, b.q
FROM a JOIN b USING (x)
WHERE b.q = 3;
x │ p │ q
═══╪═══╪═══
2 │ 1 │ 3
1 │ 1 │ 3
(2 rows)
SELECT x, a.p, b.q
FROM a JOIN b USING (x)
WHERE a.p = 1
UNION ALL
SELECT x, a.p, b.q
FROM a JOIN b USING (x)
WHERE b.q = 3;
x │ p │ q
═══╪═══╪═══
1 │ 1 │ 3
1 │ 1 │ 3
2 │ 1 │ 3
1 │ 1 │ 3
1 │ 1 │ 3
2 │ 1 │ 3
(6 rows)
The case in your question uses SELECT *
, which will include the primary key, so in this case it would be safe to use UNION
. But that is reasoning that goes beyond the capabilities of the optimizer.