Consider the following toy dataset:
clear
input group str10 name n
1 "Jenny" 1
1 "Jenny" 1
1 "Ben" 1
1 "Tiffany" 1
1 "Sun" 1
2 "Jenny" 1
2 "Sun" 1
2 "Tiffany" 1
2 "S" 1
2 "T" 1
2 "R" 1
2 "Y" 1
2 "U" 1
2 "I" 1
2 "E" 1
2 "A" 1
2 "B" 1
3 "U" 1
3 "I" 1
3 "E" 1
3 "A" 1
3 "B" 1
end
My code is the following:
gen n=1
graph hbar (count) n, over(name, sort(1)) over(group)
This shows me all jumbled up names if I use the aforementioned data:
How can I create a bar graph, which only shows the top 10 categories in terms of frequency, determined separately in each distinct value of group
?
Here's a slightly modified example:
clear
input group str50 name n
1 "Jenny" 1
1 "Jenny" 1
1 "Ben" 1
1 "Tiffany" 1
1 "Jenny" 1
1 "Sun" 1
2 "Jenny" 1
2 "Sun" 1
2 "Sun" 1
2 "Tiffany" 1
2 "Tiffany" 1
2 "Tiffany" 1
2 "Tiffany" 1
2 "Tiffany" 1
2 "S" 1
2 "T" 1
2 "R" 1
2 "Y" 1
2 "U" 1
2 "I" 1
2 "E" 1
2 "A" 1
2 "B" 1
3 "U" 1
3 "Ramon" 1
3 "Ramon" 1
3 "Ramon" 1
3 "Ramon" 1
3 "I" 1
3 "I" 1
3 "I" 1
3 "E" 1
3 "A" 1
3 "B" 1
end
You can first collapse
your dataset:
collapse (count) n, by(group name)
You can then control the number of names drawn by adjusting the frequency threshold as follows:
gsort group -n
bysort group: generate tag = _n < 3
graph hbar (asis) n if tag, over(name) over(group) nofill