sql postgresql indexing many-to-many relational-division

AND conditions in many-to-many relation

Say I have three tables, a table of users, a table of around 500 different items, and the corresponding join table. What I would like to do is:

select * from users u join items_users iu on iu.user_id = u.id
where iu.item_id in (1,2,3,4,5)
and u.city_id = 1 limit 10;

Except, instead of an IN condition, I would like to find users that have all the corresponding items. If it helps, assume that the max number of items that will be searched for at a time will be 5. Also, I am using Postgres, and don't mind denormalizing it if would help as it's a read only system and speed is highest priority.

Solution

It's another case of relational division. We have assembled quite an arsenal of queries to deal with this class of problems here.

In this case, with 5 or more items, I might try:

SELECT u.*
FROM   users AS u
WHERE  u.city_id = 1
AND EXISTS (
   SELECT *
   FROM   items_users AS a
   JOIN   items_users AS b USING (user_id)
   JOIN   items_users AS c USING (user_id)
   ...
   WHERE  a.user_id = u.user_id
   AND    a.item_id = 1
   AND    b.item_id = 2
   AND    c.item_id = 3
   ...
   )
LIMIT 10;

It was among the fastest in my tests and it fits the requirement of multiple criteria on items_users while only returning columns from user.

Read about indexes at the linked answer. these are crucial for performance. As your tables are read-only I would also CLUSTER both tables, to minimize the number of pages that have to be visited. If nothing else, CLUSTER items_users using a multi-column index on (user_id, item_id).