Search code examples
rstringdplyrreplaceacross

how to replace specific strings across many columns?


How can I map the observation values (in df1) onto the correct cells in the reference data frame (df2)? The cells are spread across (many!) different columns, so I cannot use a simple gsub on df2$specific.columnn.name

df1 contains observation values and IDs, df2 contains reference data (IDs and their 'consistency rules')

I want to replace all IDs in df2 with the observation values from DF1.

df1 <- data.frame(x = c("id1", "id2", "id3", "id4"), y = c(18, 21, 44, 800))

head(df1)
#    x   y
#  id1  18
#  id2  21
#  id3  44
#  id4 800

df2 <-data.frame(check = c("id2", "id3", "id1", "id1"),
                 checkpart2 = c("+", "==", "*", ">"), checkpart3 = c("id1", "id4", "id4", "id2"), checkpart4 =  c("==", NA, "==", NA), checkpart5 = c("id4", NA,"id1",NA))

head(df2)
#  check checkpart2 checkpart3 checkpart4 checkpart5
#   id2          +        id1         ==        id4
#   id3         ==        id4       <NA>       <NA>
#   id1          *        id4         ==        id1
#   id1          >        id2       <NA>       <NA>

the result should look like this, but then with 100+ columns:

head(df2)
check checkpart2 checkpart3 checkpart4 checkpart5
1   21          +        18     ==        800
2   44          ==       800   <NA>       <NA>
3   18          *        800    ==        18
4   18          >        21    <NA>       <NA>

Solution

  • The dplyr answer below will serve your purpose on any number of rows without inputting column names even. Demonstration of your sample.

    df1 <- data.frame(x = c("id1", "id2", "id3", "id4"), y = c(18, 21, 44, 800))
    df2 <-data.frame(check = c("id2", "id3", "id1", "id1"),
                     checkpart2 = c("+", "==", "*", ">"), checkpart3 = c("id1", "id4", "id4", "id2"), checkpart4 =  c("==", NA, "==", NA), checkpart5 = c("id4", NA,"id1",NA))
    
    library(dplyr, warn.conflicts = F)
    
    df2 %>%
      mutate(across(everything(), ~ifelse(. %in% df1$x, df1$y[match(., df1$x)], .)))
    #>   check checkpart2 checkpart3 checkpart4 checkpart5
    #> 1    21          +         18         ==        800
    #> 2    44         ==        800       <NA>       <NA>
    #> 3    18          *        800         ==         18
    #> 4    18          >         21       <NA>       <NA>
    

    Created on 2021-06-30 by the reprex package (v2.0.0)