Suppose I have the following data in Stata:
clear
input id tna ret str2 name
1 2 3 "X"
1 3 2 "X"
1 5 3 "X"
1 6 -1 "X"
2 4 2 "X"
2 6 -1 "X"
2 8 -2 "X"
2 9 3 "P"
2 11 -2 "P"
3 3 1 "Y"
3 4 0 "Y"
3 6 -1 "Y"
3 8 1 "Z"
3 6 1 "Z"
end
I want to make an ID for new groups. These new groups should incorporate the observations with the same name (for example X), but should also incorporate all the observations of the same ID if the name started in that ID. For example:
X
is in the data set under two IDs: 1 and 2. The group of X
should incorporate all the observations with the name X
, but also the two observations of the name P
(since X
started in ID 2 and the two observations with value P
belong to group X
)
Y
started in ID 3, so the group should incorporate every observation with ID 3.
This is a tricky problem to solve because it may take several pass to completely stabilize identifiers. Fortunately, you can use group_id
(from SSC) to solve this. To install group_id
, type in Stata's Command window:
ssc install group_id
Here's a more complicated data example where "P" also appears in ID == 4
and that ID
also contains "A" as a name:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id tna ret) str2 name
1 2 3 "X"
1 3 2 "X"
1 5 3 "X"
1 6 -1 "X"
2 4 2 "X"
2 6 -1 "X"
2 8 -2 "X"
2 9 3 "P"
2 11 -2 "P"
3 3 1 "Y"
3 4 0 "Y"
3 6 -1 "Y"
3 8 1 "Z"
3 6 1 "Z"
4 9 3 "P"
4 11 -2 "P"
4 12 0 "A"
end
clonevar newid = id
group_id newid, match(name)