I have currently a problem where I need to find a count of groups within a group. However, I want to create on line for the encompassing group and concatenate the smaller group within a string.
The following table shows the situation I am currently facing:
----------------------------------------
Timestamp ID Member
----------------------------------------
1 1 A
1 1 B
1 2 A
1 2 B
1 2 C
2 1 A
2 2 A
2 2 A
2 2 C
----------------------------------------
Now I want to find for each timestamp, how many (distinct) members of each ID are in the table, i.e., the result should look like this:
Timestamp MemberIDCount
----------------------------------------
1 1:2,2:3
2 1:1,2:2
----------------------------------------
i.e., the format of the string is:
[ID]:[count(distinct(member)],...
I know, that you can solve this with two consecutive group bys (i.e., first on timestamp and ID to count members and the on timestamp for the string concatenation). However, I hope that there is a smarter solution, since I have to apply that to a large dataset and dont want to have to execute 2 group bys. I work with Cloudera Impala, but solutions in other SQL languages are also appreciated.
Thank you for your help.
You can do this using group by
twice and group_concat()
. I don't have Impala on hand, but something like this should work:
select timestamp,
group_concat(concat_ws(':', member, cnt))
from (select timestamp, member, count(*) as cnt
from t
group by timestamp, member
) tm
group by timestamp
order by timestamp;