Search code examples
stata

Mutual exclusiveness in multiple observations (panel data) in Stata


I have multiple observations with different exposures and am using Stata/MP 16.1. I would like to group exposure by id depending on whether the exposure is mutually exclusive or not. Please see the data example.

The desired variable is groups which I made manually. Since the dataset contains >100,000 observations, how can I implement by code for the desired variable groups?

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str1 exposure long groups
1 "." 2
1 "a" 2
1 "a" 2
2 "a" 1
2 "." 1
2 "b" 1
2 "c" 1
3 "a" 1
3 "c" 1
3 "c" 1
4 "b" 3
4 "b" 3
4 "b" 3
end
label values groups groups
label def groups 1 "not mutually exclusive", modify
label def groups 2 "only a", modify
label def groups 3 "only b", modify

  

Solution

  •  * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float id str1 exposure long groups
        1 "a" 2
        1 "a" 2
        1 "a" 2
        2 "a" 1
        2 "a" 1
        2 "b" 1
        2 "c" 1
        3 "a" 1
        3 "c" 1
        3 "c" 1
        4 "b" 3
        4 "b" 3
        4 "b" 3
        end
        label values groups groups
        label def groups 1 "not mutually exclusive", modify
        label def groups 2 "only a", modify
        label def groups 3 "only b", modify
        
        bysort id (exposure) : gen wanted = cond(exposure[1] != exposure[_N], 1, cond(exposure[1] == "a", 2, cond(exposure[1] == "b", 3, .)))
        label val wanted groups 
    
        assert wanted == groups 
    

    The logic is that

    if there are different values within an id, assign 1

    otherwise, the values are the same; so

    assign 2 if the first value is a (equivalently all values are a)

    assign 3 if the first value is b (equivalently all values are b)

    otherwise assign missing -- there shouldn't be any such according to your example, but checking is a good idea.

    Naturally you can break that down into shorter statements:

    bysort id (exposure) : gen wanted = 1 if exposure[1] != exposure[_N] 
    by id: replace wanted = 2 if exposure[1] == "a" 
    by id: replace wanted = 3 if exposure[2] == "b" 
    

    EDIT Here is some technique more complicated set-ups. Note that Stata doesn't attach any special meaning to ".".

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id str1 exposure long groups
    1 "." 2
    1 "a" 2
    1 "a" 2
    2 "a" 1
    2 "." 1
    2 "b" 1
    2 "c" 1
    3 "a" 1
    3 "c" 1
    3 "c" 1
    4 "b" 3
    4 "b" 3
    4 "b" 3
    end
    
    label values groups groups
    label def groups 1 "not mutually exclusive", modify
    label def groups 2 "only a", modify
    label def groups 3 "only b", modify
    label def groups 4 "only c", modify
    
    gen OK = exposure != "."
    sort OK id exposure 
    by OK id: gen wanted = 1 if OK & exposure[1] != exposure[_N] 
    by OK id: replace wanted = 2 if wanted == . & OK & exposure[1] == "a"
    by OK id: replace wanted = 3 if wanted == . & OK & exposure[1] == "b"
    by OK id: replace wanted = 4 if wanted == . & OK & exposure[1] == "c"
    
    bysort id (exposure OK) : replace wanted = wanted[_N]
    drop OK 
    label val wanted groups 
    
    list, sepby(id)
    
         +-----------------------------------------------------------------+
         | id   exposure                   groups                   wanted |
         |-----------------------------------------------------------------|
      1. |  1          .                   only a                   only a |
      2. |  1          a                   only a                   only a |
      3. |  1          a                   only a                   only a |
         |-----------------------------------------------------------------|
      4. |  2          .   not mutually exclusive   not mutually exclusive |
      5. |  2          a   not mutually exclusive   not mutually exclusive |
      6. |  2          b   not mutually exclusive   not mutually exclusive |
      7. |  2          c   not mutually exclusive   not mutually exclusive |
         |-----------------------------------------------------------------|
      8. |  3          a   not mutually exclusive   not mutually exclusive |
      9. |  3          c   not mutually exclusive   not mutually exclusive |
     10. |  3          c   not mutually exclusive   not mutually exclusive |
         |-----------------------------------------------------------------|
     11. |  4          b                   only b                   only b |
     12. |  4          b                   only b                   only b |
     13. |  4          b                   only b                   only b |
         +-----------------------------------------------------------------+