How can I map the observation values (in df1) onto the correct cells in the reference data frame (df2)? The cells are spread across (many!) different columns, so I cannot use a simple gsub on df2$specific.columnn.name
df1 contains observation values and IDs, df2 contains reference data (IDs and their 'consistency rules')
I want to replace all IDs in df2 with the observation values from DF1.
df1 <- data.frame(x = c("id1", "id2", "id3", "id4"), y = c(18, 21, 44, 800))
head(df1)
# x y
# id1 18
# id2 21
# id3 44
# id4 800
df2 <-data.frame(check = c("id2", "id3", "id1", "id1"),
checkpart2 = c("+", "==", "*", ">"), checkpart3 = c("id1", "id4", "id4", "id2"), checkpart4 = c("==", NA, "==", NA), checkpart5 = c("id4", NA,"id1",NA))
head(df2)
# check checkpart2 checkpart3 checkpart4 checkpart5
# id2 + id1 == id4
# id3 == id4 <NA> <NA>
# id1 * id4 == id1
# id1 > id2 <NA> <NA>
the result should look like this, but then with 100+ columns:
head(df2)
check checkpart2 checkpart3 checkpart4 checkpart5
1 21 + 18 == 800
2 44 == 800 <NA> <NA>
3 18 * 800 == 18
4 18 > 21 <NA> <NA>
The dplyr
answer below will serve your purpose on any number of rows without inputting column names even. Demonstration of your sample.
df1 <- data.frame(x = c("id1", "id2", "id3", "id4"), y = c(18, 21, 44, 800))
df2 <-data.frame(check = c("id2", "id3", "id1", "id1"),
checkpart2 = c("+", "==", "*", ">"), checkpart3 = c("id1", "id4", "id4", "id2"), checkpart4 = c("==", NA, "==", NA), checkpart5 = c("id4", NA,"id1",NA))
library(dplyr, warn.conflicts = F)
df2 %>%
mutate(across(everything(), ~ifelse(. %in% df1$x, df1$y[match(., df1$x)], .)))
#> check checkpart2 checkpart3 checkpart4 checkpart5
#> 1 21 + 18 == 800
#> 2 44 == 800 <NA> <NA>
#> 3 18 * 800 == 18
#> 4 18 > 21 <NA> <NA>
Created on 2021-06-30 by the reprex package (v2.0.0)