Search code examples
graphstata

Graph only the top names in terms of frequency in a bar plot


Consider the following toy dataset:

clear

input group str10 name n
1     "Jenny"   1
1     "Jenny"   1
1     "Ben"     1
1     "Tiffany" 1
1     "Sun"     1
2     "Jenny"   1
2     "Sun"     1
2     "Tiffany" 1
2     "S"       1
2     "T"       1
2     "R"       1
2     "Y"       1
2     "U"       1
2     "I"       1
2     "E"       1
2     "A"       1
2     "B"       1
3     "U"       1
3     "I"       1
3     "E"       1
3     "A"       1
3     "B"       1
end

My code is the following:

gen n=1
graph hbar (count) n, over(name, sort(1)) over(group)

This shows me all jumbled up names if I use the aforementioned data:

enter image description here

How can I create a bar graph, which only shows the top 10 categories in terms of frequency, determined separately in each distinct value of group?


Solution

  • Here's a slightly modified example:

    clear
    input group str50 name n
    1     "Jenny"   1
    1     "Jenny"   1
    1     "Ben"     1
    1     "Tiffany" 1
    1     "Jenny"   1
    1     "Sun"     1
    2     "Jenny"   1
    2     "Sun"     1
    2     "Sun"     1
    2     "Tiffany" 1
    2     "Tiffany" 1
    2     "Tiffany" 1
    2     "Tiffany" 1
    2     "Tiffany" 1
    2     "S"       1
    2     "T"       1
    2     "R"       1
    2     "Y"       1
    2     "U"       1
    2     "I"       1
    2     "E"       1
    2     "A"       1
    2     "B"       1
    3     "U"       1
    3     "Ramon"   1
    3     "Ramon"   1
    3     "Ramon"   1
    3     "Ramon"   1
    3     "I"       1
    3     "I"       1
    3     "I"       1
    3     "E"       1
    3     "A"       1
    3     "B"       1
    end
    

    You can first collapse your dataset:

    collapse (count) n, by(group name)
    

    You can then control the number of names drawn by adjusting the frequency threshold as follows:

    gsort group -n
    bysort group: generate tag = _n < 3
    
    graph hbar (asis) n if tag, over(name) over(group) nofill
    

    enter image description here