A simplified version of my data would look like:
Own_Name Own_Position Year Boss_Name Boss_Position
John Director 2017 Tess Managing Director
Tess Lead Director 2017 Jim CEO
John Lead Director 2018 Jim CEO
Tess CFO 2018 Jim CEO
All data concerning Own_Name
, Own_Position
and Year
is present (so Jim and any others would get their own rows with Own_Name
and Boss_Name
etc.), but there are many entries where the Boss_Position (but not
Boss_Name`) is missing.
Thus, I'm trying to find what the Boss_Position
will be, given the Boss_Name
and Year
. I can do this if I need to pull just one observation (just keep the relevant data where Own_Name
matches Boss_Name
and Year
matches, and use the corresponding Own_Position
), but I am not sure what the best way to do this would be where I'm looping through all the missing observations for Boss_Name
, since using keep
would seem to be very destructive and time consuming.
Ideally, the code would look something like
replace Boss_Position = Own_Position[YearNameMatcher(Boss_Name Year)] if missing(Boss_Position)
where YearNameMatcher
is the function that does what I'm asking for, but I am not sure how to best proceed.
I'm also new to Stata, so I may not be aware of more obvious solutions, though I have tried searching without success.
One solution is to generate a separate and temporary dataset with the unique values of variables Boss_Name-Year
and then merge
this dataset with the original data. You can try this code with your data:
snapshot save
keep if Boss_Position!=""
drop Own*
duplicates drop Year Boss_name, force
tempfile boss
save `boss'
snapshot restore 1
drop Boss_Position
merge m:1 Boss_Name Year using `boss'
snapshot erase _all
This code supposes that the paired values Boss_name
and Year
are unique. With this approach you don't need to loop through all the missing observations.