I want to select individuals from a control pool to act as a control for a treated case. Individuals for the control are selected if their activity/inactivity in a year matches the treated's pattern of activity.
Eg. A treated case has: activity in 2009, 2011 but no activity in 2010 (so no record for that year). This treated's control should only be made up of individuals who also had activity in 2009 and 2011 and no activity in 2010.
I'm only looking at 3-year blocks in the treated activity pattern. So from the maximum year where there is activity to 2 years prior.
If I have separate data.tables for the treated and the control pool, how would I go about matching them?
Treated would look like:
treated <- data.table(id = c(1, 1, 1, 2, 2, 3, 3, 4),
yr = c(2010, 2011, 2012, 2011, 2012, 2010, 2012, 2013))
id yr
1 2010
1 2011
1 2012
2 2011
2 2012
3 2010
3 2012
4 2013
Control
control <- data.table(id = c(rep(5, 6), rep(6, 3), 7, 7, 8, 8, 8),
yr = c(2009:2014, 2011, 2012, 2013, 2010, 2012, 2009, 2013, 2014))
id yr
5 2009
5 2010
5 2011
5 2012
5 2013
5 2014
6 2011
6 2012
6 2013
7 2010
7 2012
8 2009
8 2013
8 2014
I've tried putting both in wide format so that NAs could be produced for the missing years but couldn't figure out how to match from there.
dcast(treated, id ~ yr, value.var = "yr")
id 2010 2011 2012 2013
-----------------------------------
1 2010 2011 2012 NA
2 NA 2011 2012 NA
3 2010 NA 2012 NA
4 NA NA NA 2013
So the activity/inactivity pattern for each of the treated would be:
id pattern ===================== 1 2010 2011 2012 2 NA 2011 2012 3 2010 NA 2012 4 NA NA 2013
dcast(control, id ~ yr, value.var = "yr")
id 2009 2010 2011 2012 2013 2014
----------------------------------------------------
5 2009 2010 2011 2012 2013 2014
6 NA NA 2011 2012 2013 NA
7 NA 2010 NA 2012 NA NA
8 2009 NA NA NA 2013 2014
So 1
should match with 5
2
with 6
3
with 7
4
with 8
Can anyone point me in the right direction?
(Not sure about the best way to handle this but right now I have a function that takes a treated case, selects individuals for the control and calculates effects. So I would be happy with a list of control ids that match a particular treated case. Then I'd use those ids to subset the main control data.table.)
For expected output, say the matching method was in this function:
get_control_ids <- function(treated_id){
...
return(vector_of_control_ids)
}
Then running this function on the id 1
get_control_ids(1)
would result in a vector containing the control ids that match it.
So using my tiny control pool, the function would return a vector containing just 5
.
Edit: I'm not sure what the output should look like. So any tips on that would be handy too. Maybe a data.table like:
treated_id control_ids
-----------------------------------
1 5, 10, 13
2 6, 22, 23
For more context, I have a calculate_effects(treated_key)
function which works in this way:
I have 3 data.tables:
treated
- has ids repeated with each year of activity (plus other columns)
treated_keys
- has unique ids (in my actual data I have two columns to identify a particular treated)
control
I use my calculate_effects(treated_key)
function like so:
results <- treated_keys[, calculate_effects(.SD), by = 1:nrow(treated_keys)]
Within the function, treated
is subsetted using the particular treated_key
which will extract all records belonging to that particular treated case.
Also within the function, control
is subsetted according to some matching rules, and then used in calculations.
I would like to add code within my calculate_effects(treated_key)
which will correctly select the control for that particular treated_key
.
Assuming 3 yearly blocks, you can do the following:
#expand treated to fill in gap years if any
exptrt <- treated[, .(yr=seq(max(yr)-2L, max(yr))), by=.(id)][,
att := 0L][
treated, att := 1, on=.(id, yr)]
#pivot control into id against yr
pctrl <- dcast(control, id ~ yr, length, value.var = "yr")
#for each id, pivot treated and join with control using
#whatever years are in treated incl gap years
exptrt[, cid := id]
exptrt[, {
ptrt <- dcast(.SD, cid ~ yr, value.var="att")
pctrl[ptrt, on=names(ptrt)[-1L], x.id]
}, by=.(id)]
output:
id V1
1: 1 5
2: 2 6
3: 3 7
4: 4 8