Search code examples
statisticsstataeconomicsdummy-variable

how to gen variable = 1 if at least two dummy variables == 1 in Stata?


I am trying to generate a dummy variable that = 1 if at least two or more (out of seven) dummy variables also == 1. Could anybody tell me an efficient way of doing this?


Solution

  • Let's suppose that the indicator variables concerned (you say "dummy variables", but that's a terminology over-used given its disadvantages) are x1 ... x7. From that definition it is taken that their values are 1 or 0, except that values may also be missing. Then the logic for the summary you want is

    gen xs = (x1 + x2 + x3 + x4 + x5 + x6 + x7) >= 2 if (x1 + x2 + x3 + x4 + x5 + x6 + x7) < . 
    

    That's not too difficult to type, given copy and paste to replicate the syntax for the sum. The if qualifier segregates any observations with missing on any of the indicators, for which missing will be returned for the new variable. Such observations will be reported as having a total x1 + x2 + x3 + x4 + x5 + x6 + x7 that is missing. Missing is treated as arbitrarily large in Stata, and certainly as greater than 2, which explains why the simpler code

    gen xs = (x1 + x2 + x3 + x4 + x5 + x6 + x7) >= 2 
    

    would bite you if missings were present.

    If you want a more complicated rule, you may find yourself reaching for egen functions rowtotal(), rowmiss(), and so forth. See the help for egen.