Search code examples
sql-servergroup-bypartition-by

SQL Group By a Partition By


This must be accomplished in MS SQL Server. I believe OVER( PARTITION BY) must be used, but I've failed at all my tries and I end up counting the records to each ID or something else...

I have this table:

| ID   | COLOR  |
+------+--------+
| 1    | Red    |
| 1    | Green  |
| 1    | Blue   |
| 2    | Red    |
| 2    | Green  |
| 2    | Blue   |
| 3    | Red    |
| 3    | Brown  |
| 3    | Orange |

Notice that ID = 1 and ID = 2 have precisely the same values for COLOR, however ID = 3 only shares the value COLOR = Red.

I would like to group the table as follows:

| COLOR  | COUNT | GROUPING |
+--------+-------+----------+
| Red    | 2     | Type 1   |
| Green  | 2     | Type 1   |
| Blue   | 2     | Type 1   |
| Red    | 1     | Type 2   |
| Brown  | 1     | Type 2   |
| Orange | 1     | Type 2   |

This would mean that ID = 1 and ID = 2 share the same 3 values for color and they are aggregated together as Type 1. Although ID = 3 shares one value for color to ID = 1 and ID = 2 (which is 'Red') the rest of the values are not shared, as such it is considered of Type 2 (different grouping).

The tables used are simple examples and are enough to replicate to the entire dateset, however each ID can have in theory hundreds of records with different values for colors in each row. However they are unique, one ID can't have the the same color in different rows.

My best attempt:

SELECT
    ID, 
    COLOR,
    CONCAT ('TYPE ', COUNT(8) OVER( PARTITION by ID)) AS COLOR_GROUP
FROM 
    {TABLE};

Result:

| ID   | COLOR  | GROUPING |
+------+--------+----------+
| 1    | Green  | Type 3   |
| 1    | Blue   | Type 3   |
| 1    | Red    | Type 3   |
| 2    | Green  | Type 3   |
| 2    | Blue   | Type 3   |
| 2    | Red    | Type 3   |
| 3    | Red    | Type 3   |
| 3    | Brown  | Type 3   |
| 3    | Orange | Type 3   |

Although the results are terrible I've tried different methods, none of them is better.

Hope I was clear enough.

Thank you for the help!


Solution

  • try the following:

    declare @t table ( ID  int,COLOR varchar(100))
    insert into @t select 1   ,'Red'    
    insert into @t select 1   ,'Green'  
    insert into @t select 1   ,'Blue'   
    insert into @t select 2   ,'Red'    
    insert into @t select 2   ,'Green'  
    insert into @t select 2   ,'Blue'   
    insert into @t select 3   ,'Red'    
    insert into @t select 3   ,'Brown'  
    insert into @t select 3   ,'Orange'
    
    
    select *, STUFF((SELECT CHAR(10) + ' '+COLOR
                      FROM @t t_in where t_in.ID=t.ID
                      order by COLOR
                       FOR XML PATH ('')) , 1, 1, '') COLOR_Combined 
    into #temp
    from @t t
    
    select COLOR, count(color) [COUNT], 'TYPE ' + convert(varchar(10), dense_rank() OVER (order by [grouping])) [GROUPING]
    from
    (   
        select id, COLOR, COLOR_Combined,  (row_number() over (order by id) - row_number() over (partition by Color_Combined order by id)) [grouping]
        from #temp
    )t
    group by COLOR, [grouping]
    drop table if exists #temp
    

    Please find the db<>fiddle here.