Search code examples
mysqlsqlgroup-byinner-joingroup-concat

Combine to one group with the same id or attribute SQL Group_concat doesn't help:(


I need to group row's in one row if they have same id or same attribute. So i suppose i need to use INNER JOIN and GROUP_CONCAT, but i don't know how. Problem is that if two users do not have a common attribute, but fall into the same group with the same third user, all three must be combined into one group. Also i don't have group_id column in table.

group_id, user_id, group_attributes
1, 1, "red, green, yellow, grey, purple, coffeemaker"
1, 2, "red, green, yellow, grey, purple, coffeemaker"
1, 3, "red, green, yellow, grey, purple, coffeemaker"
1, 4, "red, green, yellow, grey, purple, coffeemaker"
1, 5, "red, green, yellow, grey, purple, coffeemaker"
2, 6, "coffee, milk, croissant"
2, 7, "coffee, milk, croissant"
2, 8, "coffee, milk, croissant"

Raw data to reduce your answer time.

CREATE TABLE task (
    user_id INT(10) NOT NULL,
    attribute VARCHAR(50) NULL DEFAULT NULL);
INSERT INTO task (user_id, attribute)
VALUES
(1, 'red'),
(1, 'green'),
(2, 'green'),
(2, 'yellow'),
(3, 'grey'),
(3, 'coffeemaker'),
(4, 'grey'),
(4, 'purple'),
(5, 'purple'),
(5, 'red'),
(6, 'black'),
(7, 'black'),
(7, 'milk'),
(8, 'milk'),
(8, 'croissant');


Solution

  • This is a graph-walking problem, so a simple JOIN is not sufficient. One approach is to get all attributes associates with a given attribute. The following recursive CTE does this:

    with recursive aa as (
          select distinct t1.attribute as at1, t2.attribute as at2
          from task t1 join
               task t2
               on t1.user_id = t2.user_id
         ),
         cte as (
          select at1, at2, at1 as found, 1 as lev
          from aa
          union all
          select cte.at1, aa.at2, concat_ws(',', found, aa.at2), lev + 1
          from cte join
               aa
               on cte.at2 = aa.at1
          where find_in_set(aa.at2, found) = 0 
         )
    select distinct at1, at2
    from cte;
    

    You can then use the same recursive CTE to combine the values into a string:

    with recursive aa as (
          select distinct t1.attribute as at1, t2.attribute as at2
          from task t1 join
               task t2
               on t1.user_id = t2.user_id
         ),
         cte as (
          select at1, at2, at1 as found, 1 as lev
          from aa
          union all
          select cte.at1, aa.at2, concat_ws(',', found, aa.at2), lev + 1
          from cte join
               aa
               on cte.at2 = aa.at1
          where find_in_set(aa.at2, found) = 0 
         )
    select dense_rank() over (order by pairs.all_attributes) as group_id, t.user_id, pairs.all_attributes
    from (select at1, group_concat(at2) as all_attributes
          from cte
          group by at1
         ) pairs join
         (select user_id, min(attribute) as min_attribute
          from task
          group by user_id
         ) t
         on t.min_attribute = pairs.at1;
    

    I see nothing wrong with this code. But db<>fiddle is insisting on creating a hex string for pairs. However, I think this will work on your database. Here is the fiddle.