Search code examples
sumtabspivot-tablestatagroup

How to create a percentage + total for one variable in a matrix with rows and columns representing two categorical variables?


Let's say I have three variables:

  1. female, dummy that takes 1 if female, 0 if otherwise
  2. municipality, categorical variables of 10 categories (amsterdam, berlin, london, barcelona, copenhagen, oslo, stockholm, rome, vienna, lisbon)
  3. industry, with 5 categories (manufacturing, government, services, transport, education)

I want to create a table with municipality as rows, industry as columns and then I want each cell to contain two values:

  • % female for the particular combination of municipality and industry
  • N total (both female and not female) for the particular combination of municipality and industry

Is it possible to make such a table in Stata?


Solution

  • Here is a general code snippet that you should be able to adapt to your needs:

    clear
    set obs 10000
    
    * Randomly create the data
    set seed 123456
    gen municipality = ceil(runiform()*10)
    gen industry = ceil(runiform()*5)
    gen female = (runiform()<.6)
    
    *Calculate N and percentages (technically shares) by municipality and industry
    collapse (mean) percent_female=female (count) N=female , by(municipality industry)
    
    * Reshape so each row is a municipality
    reshape wide percent_female N, i(municipality) j(industry)
    
    * Export data to table
    *ssc install dataout
    local folder "path/to/folder"
    dataout , excel save("`folder'/mytable.xlsx")