Search code examples
sqlimpala

SQL Group by within another group by


I have currently a problem where I need to find a count of groups within a group. However, I want to create on line for the encompassing group and concatenate the smaller group within a string.

The following table shows the situation I am currently facing:

----------------------------------------
Timestamp           ID             Member
----------------------------------------
1                   1              A
1                   1              B
1                   2              A
1                   2              B
1                   2              C
2                   1              A
2                   2              A
2                   2              A
2                   2              C
----------------------------------------

Now I want to find for each timestamp, how many (distinct) members of each ID are in the table, i.e., the result should look like this:


Timestamp           MemberIDCount
----------------------------------------
1                   1:2,2:3
2                   1:1,2:2
----------------------------------------

i.e., the format of the string is:

[ID]:[count(distinct(member)],...

I know, that you can solve this with two consecutive group bys (i.e., first on timestamp and ID to count members and the on timestamp for the string concatenation). However, I hope that there is a smarter solution, since I have to apply that to a large dataset and dont want to have to execute 2 group bys. I work with Cloudera Impala, but solutions in other SQL languages are also appreciated.

Thank you for your help.


Solution

  • You can do this using group by twice and group_concat(). I don't have Impala on hand, but something like this should work:

    select timestamp,
           group_concat(concat_ws(':', member, cnt))
    from (select timestamp, member, count(*) as cnt
          from t
          group by timestamp, member
         ) tm
    group by timestamp
    order by timestamp;