Coding dichotomous variables in Stata

I have a set of dichotomous variables for firm size: emp1_2 (i.e. firm with 1 or 2 employed people, including the owner), emp3_9, emp10_19, emp20_49, emp50_99, emp100_249, emp250_499, emp500, plus I do not have information on 27 firms size but I have an educated guess that they are large firms.

I want to create a dichotomous variable for a firm being a "small firm"; therefore, this variable equals 1 when emp1_2==1 | emp3_9==1 | emp10_19==1 equals 1, and 0 otherwise.

To my understanding of Stata, of which I am a bare user, the two following methods to construct dichotomous variables should be equivalent.

Method 1)

gen lar_firm = 0
replace lar_firm = 1 if emp1_2==1 | emp3_9==1 | emp10_19==1

Method 2)

gen lar_firm = (emp1_2 | emp3_9 | emp10_19)

Instead I have found out that with method 2) lar_firm equals 1 for firms for which emp1_2 | emp3_9 | emp10_19 and for firms that do not enter in any of the categories (i.e. emp1_2, emp3_9, emp10_19, emp20_49, emp50_99, emp100_249, emp250_499, emp500) but for which I have an educated guess that they are large firms.

I am wondering whether there is some subtle difference between the two methods. I though they should lead to equal outcomes.

Solution

When you do

gen lar_firm = emp1_2 | emp3_9 | emp10_19

you're testing if

(emp1_2 != 0) | (emp3_9 != 0) |(emp10_19 != 0)

In particular, missing values . are different from 0: they are greater in fact.

For more information:

http://www.stata.com/support/faqs/data-management/logical-expressions-and-missing-values/