Search code examples
sqlsql-serverhadoophivecloudera

SQL group by different levels on the same dataset


I have the following dataset, and I hope to create different groups to count the occurrence of values under name.

Have: (county is in string)

name   state  county 
apple   MD      1
apple   DC      1
pear    VA      1
pear    VA      2
pear    CA      5
peach   CO      3
peach   CO      3
peach   CO      2
peach   CO      2

Want:

name   state  county freq_name  freq_state  freq_county
apple   MD      1     2            1            2
apple   DC      1     2            1            2
pear    VA      1     3            2            3
pear    VA      2     3            2            3
pear    CA      5     3            1            3
peach   CO      3     4            4            2
peach   CO      2     4            4            2

I believe by SQL , over partition will allow count by different levels something like:

count(name) over (partition by name) as freq_name,
count(name) over (partition by state) as freq_state,
count(name) as freq_county
from have
group by name,state, county;

For some reason, this piece of code does not give me the right counts for freq_name. I also want to check if my code for freq_state and freq_county are accurate. Thanks!


Solution

  • You seem to want:

    select name, state, county, count(*) as this_count,
           sum(count(*)) over (partition by name) as freq_name,
           sum(count(*)) over (partition by state) as freq_state,
           sum(count(*)) as freq_county
    from have
    group by name, state, county;