Search code examples
sqldatabaselabelrdbms

SQL, label user based on the similarity


Is below case possible in SQL?

Let say I have a table like this:

user_id product_id
1 123
1 122
1 121
2 124
2 125
2 121
3 123
3 122
3 122
4 123
4 212
4 222
5 124
5 125
5 121

I want to label the user if they have same product_id, regardless the order, so the output looks like this:

user_id product_id label
1 123 a
1 122 a
1 121 a
2 124 b
2 125 b
2 121 b
3 123 a
3 121 a
3 122 a
4 123 c
4 212 c
4 222 c
5 124 b
5 125 b
5 121 b

Please advise


Solution

  • You can use the string_agg function to get the list of product_ids for each user (as a single string), then use the dense_rank function on that string to get unique labels for each product_ids list.

    select T.user_id, T.product_id, D.label
    from table_name T join
    (
      select user_id, 
        chr(dense_rank() over (order by user_products) + 96) label
      from
      (  
       select user_id, 
        string_agg(cast(product_id as string), ',' order by product_id)  user_products
       from table_name
       group by user_id
      ) lbl
    ) D
    on T.user_id = D.user_id
    order by T.user_id