Search code examples
rdataframereplacedplyrremap

Replace all values in dataframe using another dataframe as key in R


I have two dataframes and I want to replace all values ( in all the columns) of df1 using the equivalent value in df2 (df2$value).

df1 enter image description here

structure(list(Cell_ID = c(7L, 2L, 3L, 10L), n_1 = c(0L, 0L, 
0L, 0L), n_2 = c(9L, 1L, 4L, 1L), n_3 = c(10L, 4L, 5L, 2L), n_4 = c(NA, 
5L, NA, 4L), n_5 = c(NA, 7L, NA, 6L), n_6 = c(NA, 9L, NA, 8L), 
    n_7 = c(NA, 10L, NA, 3L)), class = "data.frame", row.names = c(NA, 
-4L))

df2 enter image description here

structure(list(Cell_ID = 0:10, value = c(5L, 100L, 200L, 300L, 
400L, 500L, 600L, 700L, 800L, 900L, 1000L)), class = "data.frame", row.names = c(NA, 
-11L))

The desired output would look like this:

enter image description here

So far I tried this as suggested in another similar post but its not doing it well (randomly missing some points)

key= df2$Cell_ID
value = df2$value
lapply(1:8,FUN = function(i){df1[df1 == key[i]] <<- value[i]})

enter image description here

Note that the numbers have been just multiplied by 10 for ease in the example the real data has numbers are all over the place so just multiplying the dataframe by 10 won't work.


Solution

  • An option is match the elements with the 'Cell_ID' of second dataset and use that as index to return the corresponding 'value' from 'df2'

    library(dplyr)
    df1 %>%
         mutate(across(everything(), ~  df2$value[match(., df2$Cell_ID)]))
    

    -output

    #  Cell_ID n_1 n_2  n_3 n_4 n_5 n_6  n_7
    #1     700   5 900 1000  NA  NA  NA   NA
    #2     200   5 100  400 500 700 900 1000
    #3     300   5 400  500  NA  NA  NA   NA
    #4    1000   5 100  200 400 600 800  300
    

    Or another option is to use a named vector to do the match

    library(tibble)
    df1 %>%
          mutate(across(everything(), ~ deframe(df2)[as.character(.)]))
    

    The base R equivalent is

    df1[] <- lapply(df1, function(x) df2$value[match(x, df2$Cell_ID)])