Search code examples
sqlsql-servergroup-bydistinctrollup

Group By With Rollup and Count(Distinct)


I'm running into a slight issue with using a Group By With Rollup while getting a Count of a Distinct.

The issue is that the Rollup summary is only the total count of Distinct values across all groupings, rather than a summary of all groupings.

Here's a test scenario to illustrate what I mean:

Create Table #Test
(
    GroupId Int Not Null,
    Value   Int Not Null
)

Insert  #Test (GroupId, Value)
Values  (1, 1),(1, 2),(1, 3),(1, 4),(1, 5),(1, 5),(1, 1),
        (2, 1),(2, 6),(2, 7),(2, 5),(2, 7),(2, 5),
        (3, 9),(3, 10),(3, 11),(3, 4),(3, 5),(3, 7),(3, 8),(3, 5),(3, 7),(3, 8)

For this particular table, if I run this query:

Select  Case When Grouping(GroupId) = 1 Then 'Total:' Else Str(GroupId) End As GroupId, 
        Count(Distinct Value) Count
From    #Test
Group By GroupId With Rollup
Order By Grouping(GroupId), GroupId

I'm getting the following result:

GroupId Count
-------------
1       5
2       4
3       7
Total:  11  

My expected result for the Total row would be 16, but instead I'm only getting 11 -- which is the total number of Distinct Values among all groups.

Removing the Distinct from the query does show the expected results for that Rollup:

Select  Case When Grouping(GroupId) = 1 Then 'Total:' Else Str(GroupId) End As GroupId, 
        Count(Value) Count
From    #Test
Group By GroupId With Rollup
Order By Grouping(GroupId), GroupId

Which yields these results:

GroupId Count
-------------
1       7
2       6
3       10
Total:  23

Which summarizes the groups as expected.

My question is: is this normal for a Rollup on a Count Distinct? Is there some other Rollup-like option to use for the Grouping to get 16 displayed rather than 11 in the example above?


Solution

  • You can get what you want by nesting the queries and using a trick:

    select (Case When Grouping(GroupId) = 1 Then 'Total:' Else Str(GroupId) End) as GroupId, 
           Sum(Count) as Count
    from (Select GroupId, 
                 Count(Distinct Value) as Count
          From  #Test
          Group By GroupId
         ) t
    Group By GroupId With Rollup
    Order By Grouping(GroupId), GroupId;
    

    The second group by doesn't logically do an aggregation, because there is only one row per group. It is just there for getting the value you want in the rollup.