Search code examples
graphstatacatplot

Drawing percentage graphs without percent variables


I have a dataset in Stata that looks like the following :

Year Gender Presidents Presidents_F Presidents_M Presidents_Total
2023 Male 5 6 5 11
2023 Female 6 6 5 11
2023 Total 11 6 5 11
2022 Male 3 2 3 5
2022 Female 2 2 3 5
2022 Total 5 2 3 5

I want to be able to draw stacked graphs that show the percentage share of female and male presidents (sum to 100) over years (dataset goes from 1970 to 2023) without generating separate percentage variables but using the absolute numbers already available.

I earlier tried generating line graphs from percentage variables and that worked well in showing the trends. However, my supervisor wants me to not generate additional unnecessary variables but still be able to graph the percentages across years.

She suggested using catplot but I am unsure how catplot can give be the ability to define the formula for the percentage within the command.

Please suggest the best way to go forward.


Solution

  • Your data already show considerable redundancy. You don't cite your previous threads in which I advised (twice over) against keeping totals in separate observations.

    Conditional Division in Stata

    Is there a way to calculate percentages comparing observations?

    catplot (from SSC, as you are asked to explain) can give you stacked bars, using its percent() option, but the redundancy of showing two complementary percentages can be avoided by using a line graph of either one.

    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int year str6 gender byte(presidents presidents_f presidents_m presidents_total)
    2023 "Male"    5 6 5 11
    2023 "Female"  6 6 5 11
    2023 "Total"  11 6 5 11
    2022 "Male"    3 2 3  5
    2022 "Female"  2 2 3  5
    2022 "Total"   5 2 3  5
    end
    
    catplot gender year if inlist(gender, "Male", "Female") [fw=presidents],  percent(year) asyvars stack
    
    bysort year (gender) : gen pcfemale = 100 * presidents[1] / presidents[3] 
    levelsof year, local(years)
    line pcfemale year , ytitle(% female presidents) xla(`years')