Search code examples
rreplacestatamissing-data

Transfer Many Stata Replaces to R


I have a couple thousand lines of Stata code that generally aims to replace negative (missing) values with a proper missing value (.) from a peer, and I need to transfer this code to R. To do so, I have taken the code and saved it as a single column of character strings. Replacements essentially look like the following, ad nauseam:

replace R04_ADULTTYPE = . if (R04_ADULTTYPE <= -1 )

These R04_ are variables in a data set, so I hope to essentially transfer these lines of Stata to R efficiently.

I have tried taking this and separating/replacing to easily iterate over a list of variables that need replacing, but I am running low on ideas. Any ideas on how to easily transfer these replaces en masse to R if I have them in the form of a character string data set? My expected output is essentially conducting many Stata replaces in R, which I have presented in data below.

Dput of the head of the data (rawMissing). Thanks!

# Data (many Stata replaces
dput(head(rawMissing))
structure(list(replacements = c("replace R04_ADULTTYPE = . if (R04_ADULTTYPE <= -1 )", 
"replace R04R_A_AT0047 = . if (R04R_A_AT0047 <= -1 )", "replace R04R_A_AM0069 = . if (R04R_A_AM0069 <= -1 )", 
"replace R04R_A_AM0065_V2 = . if (R04R_A_AM0065_V2 <= -1 )", 
"replace R04_AM0066 = . if (R04_AM0066 <= -1 )", "replace R04_AM0070 = . if (R04_AM0070 <= -1 )"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

# Expected output would be efficiently conducting these many replaces in R

Solution

  • We may extract the column names, operator and the value to be replaced as separate columns

    library(dplyr)
    library(tidyr)
    keydat <- rawMissing %>%
         extract(replacements, into = c('colnm', 'operator', 'value'), 
             '^[^(]+\\((\\w+)\\s+([[:punct:]]+)\\s+(-?[0-9]+)')
    

    then, using the above data, loop across the original dataset say 'df1' by looping across the columns specified in the 'keydat' and do the replacements

    df2 <- df1 %>%
       mutate(across(all_of(keydat$colnm), ~ 
             {
             op <- keydat$operator[match(cur_column(), keydat$colnm)]
             val <-  keydat$value[match(cur_column(), keydat$colnm)]
             replace(., match.fun(op)(., val), NA)
            
    
    
            }))