I have the following dataset, and I hope to create different groups to count the occurrence of values under name.
Have: (county is in string)
name state county
apple MD 1
apple DC 1
pear VA 1
pear VA 2
pear CA 5
peach CO 3
peach CO 3
peach CO 2
peach CO 2
Want:
name state county freq_name freq_state freq_county
apple MD 1 2 1 2
apple DC 1 2 1 2
pear VA 1 3 2 3
pear VA 2 3 2 3
pear CA 5 3 1 3
peach CO 3 4 4 2
peach CO 2 4 4 2
I believe by SQL , over partition will allow count by different levels something like:
count(name) over (partition by name) as freq_name,
count(name) over (partition by state) as freq_state,
count(name) as freq_county
from have
group by name,state, county;
For some reason, this piece of code does not give me the right counts for freq_name. I also want to check if my code for freq_state and freq_county are accurate. Thanks!
You seem to want:
select name, state, county, count(*) as this_count,
sum(count(*)) over (partition by name) as freq_name,
sum(count(*)) over (partition by state) as freq_state,
sum(count(*)) as freq_county
from have
group by name, state, county;