Search code examples
stata

Computing Unemployment rates by education group from an indicator variable (Stata)


I have the following variable indicating whether an observation is working or unemployed, where 0 indicates working and 1 refers to unemployed.

dataex unemp
input float unemp
0
0
0
0
1
.
1

When I tabulate the variable:

Unemploymen |
 t |              Freq.     
------------+--------------
 Employed   |    80      
 Unemployed |    20   
 Total LF        100    

I essentially want to divide 20/100, to obtain a total unemployment variable of 20%. I have done this manually now, but think it is better to automate this as I also want to compute unemployment by different education groups and geographic regions.

gen unemployment_broad = .
replace unemployment_broad = (20/100)*100

The education variable is as follows, where 1 "Less than basic", 2 "Basic", 3 "Secondary", 4 "Higher education",

Is there a way to compute unemployment rate by each education group?

input float educ
2
4
4
4
2
4
1
3
3
3

Using Cybernike's solution, I tried to create a variable showing unemployment by education as follows, but I got an error:

gen unemp_educ = .
replace unemp_educ = bysort educ: summarize unemp

I essentially want to visualize unemployment by education. With something like this:

graph hbar (mean)  Unemployment, over(education) 

This is because I also intend to replicate the same equation by demographic group, gender, etc.


Solution

  • Your unemployment variable is coded as 0/1. Therefore, you can obtain the proportion unemployed by taking the mean value. You could do this using the summarize command, or using the collapse command. Both of these can be performed by education group.

    clear
    input unemp educ
    0 2
    0 4
    0 4
    0 4
    1 2
    0 3
    1 3
    1 1
    1 3
    end
    
    bysort educ: summarize unemp
    
    collapse (mean) unemp, by(educ)
    
    list
    
         +-----------------+
         | educ      unemp |
         |-----------------|
      1. |    1          1 |
      2. |    2         .5 |
      3. |    3   .6666667 |
      4. |    4          0 |
         +-----------------+
    

    In response to your edit, you can also save the mean values to the original dataset using:

    bysort educ: egen unemp_mean = mean(unemp)
    

    Your code for plotting the data seems to work fine.