Search code examples
statadummy-variable

Coding dichotomous variables in Stata


I have a set of dichotomous variables for firm size: emp1_2 (i.e. firm with 1 or 2 employed people, including the owner), emp3_9, emp10_19, emp20_49, emp50_99, emp100_249, emp250_499, emp500, plus I do not have information on 27 firms size but I have an educated guess that they are large firms.

I want to create a dichotomous variable for a firm being a "small firm"; therefore, this variable equals 1 when emp1_2==1 | emp3_9==1 | emp10_19==1 equals 1, and 0 otherwise.

To my understanding of Stata, of which I am a bare user, the two following methods to construct dichotomous variables should be equivalent.

Method 1)

gen lar_firm = 0
replace lar_firm = 1 if emp1_2==1 | emp3_9==1 | emp10_19==1

Method 2)

gen lar_firm = (emp1_2 | emp3_9 | emp10_19)

Instead I have found out that with method 2) lar_firm equals 1 for firms for which emp1_2 | emp3_9 | emp10_19 and for firms that do not enter in any of the categories (i.e. emp1_2, emp3_9, emp10_19, emp20_49, emp50_99, emp100_249, emp250_499, emp500) but for which I have an educated guess that they are large firms.

I am wondering whether there is some subtle difference between the two methods. I though they should lead to equal outcomes.


Solution

  • When you do

    gen lar_firm = emp1_2 | emp3_9 | emp10_19 
    

    you're testing if

    (emp1_2 != 0) | (emp3_9 != 0) |(emp10_19 != 0)  
    

    In particular, missing values . are different from 0: they are greater in fact.

    For more information:

    http://www.stata.com/support/faqs/data-management/logical-expressions-and-missing-values/