I am new to R, and could not find specific help for my question on this site.
I have (among others) ten character variables in my dataframe $grant_database, country_1 through country_10. Each contains either a country code, for example E20, F27 or G10, or an NA. Each case is a grant to a project. The ten country variables specify which country/countries a grant is benefitting. In my dataframe, most, but not all cases will have at least one country code, first marked in country_1, many will have one for country_2 as well, and some even for country_3 to _10. All empty fields are marked with an NA.
id country_1 country_2 country_3 country_4 country_5 country_6 ...new_binaryvar
1 F20 NA NA NA NA NA 0
2 E12 E17 E52 NA NA NA 0
3 O62 O33 NA NA NA NA 0
4 E21 E20 NA NA NA NA 1
5 NA NA NA NA NA NA 0
...
I wish to create a new factor flagging grants which benefit a defined subset of countries. This binary "dummy" variable should give the value "1" to each case that in at least one of the ten country variables corresponds with a list of country codes. It should give "0" to each case/grant that does not have a corresponding country code in any of its ten country variables. Let this subset of country codes to be flagged be: E20, F27 and G10 (in reality, there are about 40 to be flagged, from 150+).
Would you help me out by suggesting a way to program this? Thank you very much for your help!
Assuming that you wanted to check whether a subset of "countrycodes" are there in each of the "country" variables with the condition that if atleast one of the "countrycode" is present in a particular row, that row will get "1", or else "0". The idea is to create a vector (v1
) of "countrycodes" that needs to be checked. Convert the dataset (df
) to matrix after removing the "id" column (as.matrix(df[,-1])
) and then create a logical vector by comparing with "v1" (%in%
). The vector can be changed back to "matrix" by assigning the dimensions (dim<-
) to dimension of df[,-1]
ie (c(5,7)
). Do the rowSums
, double negate (!!
), finally add 0
to get the binary dummy variable.
v1 <- c('E20', 'F27', 'G10')
(!!rowSums(`dim<-`(as.matrix(df[,-1]) %in% v1, c(5,7))))+0
#[1] 0 0 0 1 0
df <- structure(list(id = 1:5, country_1 = c("F20", "E12", "O62", "E21",
NA), country_2 = c(NA, "E17", "O33", "E20", NA), country_3 = c(NA,
"E52", NA, NA, NA), country_4 = c(NA, NA, NA, NA, NA), country_5 = c(NA,
NA, NA, NA, NA), country_6 = c(NA, NA, NA, NA, NA), country_7 = c(NA,
NA, NA, NA, NA)), .Names = c("id", "country_1", "country_2",
"country_3", "country_4", "country_5", "country_6", "country_7"
), class = "data.frame", row.names = c(NA, -5L))