I want to create a new variable(named "treatment") in a dataset using two different datasets. My original datasets are two big datasets with other variables however, for simplicity, let's say I have the following datasets:
#individual level data, birth years
a <- data.frame (country_code = c(2,2,2,10,10,10,10,8),
birth_year = c(1920,1930,1940,1970,1980,1990, 2000, 1910))
#country level reform info with affected cohorts
b <- data.frame(country_code = c(2,10,10,11),
lower_cutoff = c(1928, 1975, 1907, 1934),
upper_cutoff = c(1948, 1995, 1927, 1948),
cohort = c(1938, 1985, 1917, 1942))
Dataset a
is an individual dataset with birth year informations and dataset b
is country-level data with some country reform information. According to dataset b
I want to create a treatment column in the dataset a
. Treatment is 1 if the birth_year
is between the cohort
and upper_cutoff
and 0 if between cohort
and lower_cutoff
. And anything else should be NA.
After creating an empty treatment column, I used the following code below:
for(i in 1:nrow(a)) {
for(j in 1:nrow(b)){
a$treatment[i] <- ifelse(a$country_code[i] == b$country_code[j] &
a$birth_year[i] >= b$cohort[j] &
a$birth_year[i]<= b$upper_cutoff[j], "1",
ifelse(a$ison[i] == b$ison[j] &
a$birth_year[i] < b$cohort[j] &
a$birth_year[i]>= b$lower_cutoff[j], "0", NA))
}
}
As well as:
for(i in 1:nrow(a)) {
for(j in 1:nrow(b)){
a[i, "treatment"] <- case_when(a[i,"country_code"] == b[j, "country_code"] &
a[i,"birth_year"] >= b[j,"cohort"] &
a[i,"birth_year"]<= b[j,"upper_cutoff"] ~ 1,
a[i,"country_code"] == b[j, "country_code"] &
a[i,"birth_year"] < b[j,"cohort"]&
a[i,"birth_year"]>= b[j,"lower_cutoff"] ~ 0)
}
}
Both codes run, but they only return NAs. The following is the result I want to get:
treatment <- c(NA, 0, 1, NA, 0, 1, NA, 0)
Any ideas about what is wrong? Or any other suggestions? Thanks in advance!
I found the mistake in my code. Apparently, I should have used a break to avoid overwriting the variable I'm creating. But, I'm still open to other answers.
for(i in 1:nrow(a)) {
for(j in 1:nrow(b)){
if(!is.na(a$treatment[i])){break} #to make it stop if I already assign a value
a$treatment[i] <- ifelse(a$country_code[i] == b$country_code[j] &
a$birth_year[i] >= b$cohort[j] &
a$birth_year[i]<= b$upper_cutoff[j], "1",
ifelse(a$ison[i] == b$ison[j] &
a$birth_year[i] < b$cohort[j] &
a$birth_year[i]>= b$lower_cutoff[j], "0", NA))
}
}