I have a couple thousand lines of Stata code that generally aims to replace negative (missing) values with a proper missing value (.
) from a peer, and I need to transfer this code to R. To do so, I have taken the code and saved it as a single column of character strings. Replacements essentially look like the following, ad nauseam:
replace R04_ADULTTYPE = . if (R04_ADULTTYPE <= -1 )
These R04_
are variables in a data set, so I hope to essentially transfer these lines of Stata to R efficiently.
I have tried taking this and separating/replacing to easily iterate over a list of variables that need replacing, but I am running low on ideas. Any ideas on how to easily transfer these replaces en masse to R if I have them in the form of a character string data set? My expected output is essentially conducting many Stata replaces in R, which I have presented in data below.
Dput of the head of the data (rawMissing
). Thanks!
# Data (many Stata replaces
dput(head(rawMissing))
structure(list(replacements = c("replace R04_ADULTTYPE = . if (R04_ADULTTYPE <= -1 )",
"replace R04R_A_AT0047 = . if (R04R_A_AT0047 <= -1 )", "replace R04R_A_AM0069 = . if (R04R_A_AM0069 <= -1 )",
"replace R04R_A_AM0065_V2 = . if (R04R_A_AM0065_V2 <= -1 )",
"replace R04_AM0066 = . if (R04_AM0066 <= -1 )", "replace R04_AM0070 = . if (R04_AM0070 <= -1 )"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
# Expected output would be efficiently conducting these many replaces in R
We may extract
the column names, operator and the value to be replaced as separate columns
library(dplyr)
library(tidyr)
keydat <- rawMissing %>%
extract(replacements, into = c('colnm', 'operator', 'value'),
'^[^(]+\\((\\w+)\\s+([[:punct:]]+)\\s+(-?[0-9]+)')
then, using the above data, loop across
the original dataset say 'df1' by looping across
the columns specified in the 'keydat' and do the replace
ments
df2 <- df1 %>%
mutate(across(all_of(keydat$colnm), ~
{
op <- keydat$operator[match(cur_column(), keydat$colnm)]
val <- keydat$value[match(cur_column(), keydat$colnm)]
replace(., match.fun(op)(., val), NA)
}))