Say I have three tables, a table of users, a table of around 500 different items, and the corresponding join table. What I would like to do is:
select * from users u join items_users iu on iu.user_id = u.id
where iu.item_id in (1,2,3,4,5)
and u.city_id = 1 limit 10;
Except, instead of an IN
condition, I would like to find users that have all the corresponding items. If it helps, assume that the max number of items that will be searched for at a time will be 5. Also, I am using Postgres, and don't mind denormalizing it if would help as it's a read only system and speed is highest priority.
It's another case of relational division. We have assembled quite an arsenal of queries to deal with this class of problems here.
In this case, with 5 or more items, I might try:
SELECT u.*
FROM users AS u
WHERE u.city_id = 1
AND EXISTS (
SELECT *
FROM items_users AS a
JOIN items_users AS b USING (user_id)
JOIN items_users AS c USING (user_id)
...
WHERE a.user_id = u.user_id
AND a.item_id = 1
AND b.item_id = 2
AND c.item_id = 3
...
)
LIMIT 10;
It was among the fastest in my tests and it fits the requirement of multiple criteria on items_users
while only returning columns from user
.
Read about indexes at the linked answer. these are crucial for performance.
As your tables are read-only I would also CLUSTER
both tables, to minimize the number of pages that have to be visited. If nothing else, CLUSTER items_users
using a multi-column index on (user_id, item_id)
.