Search code examples
sqlclickhouserelational-division

Query that will find users who post THE SAME SET of marks as user2


I have a table ORDERS which consists of columns "user", "mark" and "mark_cnt". And I need to write such an sql query that will output me those users which have the same set of marks as user2. He has one m1, two m2 and one m4. So, basically, it means that this set must fully intersect with other users marks.

U1 has: m1, m2, m2, m3, m4, m4, m4
U2 has: m1, m2, m2, m4
U3 has: m1, m2, m2, m2, m3, m3, m3, m3, m3, m3, m3, m4, m4, m4, m5
U4 has: m1, m2, m4

I tried to use this code:

select user
from orders
where mark in  (
               select mark from orders where user='u2'
               )
group by user
having count(mark) = (select count(mark) from order where user='u2')

click here to see table image

The correct answer must be U1 and U3


Solution

  • This is classic Relational Division With Remainder, with the slight tweak that mark_cnt matches if it is greater than or equal to the original set's row.

    There are many solutions. The most efficient is usually to use a having to compare the count of matches.

    select others.`user`
    from orders as original
    join orders as others
      on others.mark = original.mark
     and others.mark_cnt >= original.mark_cnt
     and others.`user` <> original.`user`
    where original.`user` = 'U2'
    group by
      others.`user`
    having
      count(*) = (select count(*) from orders as o where o.`user` = 'U2');
    

    db<>fiddle

    Note that the other answer does not return correct results in the case where there is at least one row in the original set which doesn't match anywhere.