Search code examples
stataidentify

Identify group with two variables


Suppose I have the following data in Stata:

clear 
input id tna ret str2 name
1 2 3 "X"
1 3 2 "X"
1 5 3 "X"
1 6 -1 "X"
2 4 2 "X"
2 6 -1 "X"
2 8 -2 "X"
2 9 3 "P"
2 11 -2 "P"
3 3 1 "Y"
3 4 0 "Y"
3 6 -1 "Y"
3 8 1 "Z"
3 6 1 "Z"
end

I want to make an ID for new groups. These new groups should incorporate the observations with the same name (for example X), but should also incorporate all the observations of the same ID if the name started in that ID. For example:

  1. X is in the data set under two IDs: 1 and 2. The group of X should incorporate all the observations with the name X, but also the two observations of the name P (since X started in ID 2 and the two observations with value P belong to group X)

  2. Y started in ID 3, so the group should incorporate every observation with ID 3.


Solution

  • This is a tricky problem to solve because it may take several pass to completely stabilize identifiers. Fortunately, you can use group_id (from SSC) to solve this. To install group_id, type in Stata's Command window:

    ssc install group_id
    

    Here's a more complicated data example where "P" also appears in ID == 4 and that ID also contains "A" as a name:

    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id tna ret) str2 name
    1  2  3 "X"
    1  3  2 "X"
    1  5  3 "X"
    1  6 -1 "X"
    2  4  2 "X"
    2  6 -1 "X"
    2  8 -2 "X"
    2  9  3 "P"
    2 11 -2 "P"
    3  3  1 "Y"
    3  4  0 "Y"
    3  6 -1 "Y"
    3  8  1 "Z"
    3  6  1 "Z"
    4  9  3 "P"
    4 11 -2 "P"
    4 12  0 "A"
    end
    
    clonevar newid = id
    group_id newid, match(name)