Search code examples
databasepostgresqlgroup-byrelational-division

Avoiding a quad join while finding all users that match a set of votes on an item


Let's say the following tables exist:

a user table with user_id

an item table with item_id

and a vote table with user_id, item_id, value

where value can be the integers 0 through 3. Users can vote on items.

I am looking for a way to perform a query such that I can find all users that have voted the same as an input user with a set of specified items and corresponding vote values.

For example, say I send a request to the app server with the following information:

user_id: 5
votes: [
  { item_id: 7, vote_value: 0 }, 
  { item_id: 3, vote_value: 3 }, 
  { item_id: 5, vote_value: 1 }, 
  { item_id: 41, vote_value: 3 }, 
  { item_id: 23, vote_value: 2 }
]

I would like to then return all the users that at least share these results (that is, all other users that voted 0 on item 7, voted 3 on item 3, voted 1 on item 5, voted 3 on item 41, and voted 2 on item 23). They can have more votes and other votes on other items, but at least those must match. Also, five voted items passed in is just an example. There are an arbitrary amount.

The only solution I have come up with so far involves a quad join and having a column for all 0 votes, all 1 votes, all 2 votes, and all 3 votes for each user, and then returning that entire dataset to the app-server to perform calculations there to find matches. But I was hoping there is a better, more performant way.


Solution

  • One way to do it

    SELECT user_id
      FROM votes
     GROUP BY user_id
    HAVING MAX(CASE WHEN (item_id, value) = (7, 0)  THEN 1 ELSE 0 END) = 1
       AND MAX(CASE WHEN (item_id, value) = (3, 3)  THEN 1 ELSE 0 END) = 1
       AND MAX(CASE WHEN (item_id, value) = (5, 1)  THEN 1 ELSE 0 END) = 1
       AND MAX(CASE WHEN (item_id, value) = (41, 3) THEN 1 ELSE 0 END) = 1
       AND MAX(CASE WHEN (item_id, value) = (23, 2) THEN 1 ELSE 0 END) = 1
    

    SQLFiddle

    The HAVING clause can also be expressed in the following manner

    ...
    HAVING MAX(CASE WHEN (item_id, value) = (7, 0)  THEN 1 ELSE 0 END) 
         + MAX(CASE WHEN (item_id, value) = (3, 3)  THEN 1 ELSE 0 END)
         + MAX(CASE WHEN (item_id, value) = (5, 1)  THEN 1 ELSE 0 END)
         + MAX(CASE WHEN (item_id, value) = (41, 3) THEN 1 ELSE 0 END)
         + MAX(CASE WHEN (item_id, value) = (23, 2) THEN 1 ELSE 0 END) = 5
    

    SQLFiddle