Search code examples
rdataframedplyrreplacemultiple-columns

Replace multiple values using another dataframe


I want to replace different values in different columns using another data frame

df1

v1   v2 v3  v4
RDA  NA ILL NA
OPCS NA NA  NA
OJK  BR ILL PE

df2

v1  v2  v3  v5
RDA COL ILL P3
OPCS BR ILL P2

and I would like the output data frame to look like this:

v1  v2  v3  v4 v5
RDA COL ILL NA P3
OPCS BR ILL NA P2
OJK  BR ILL PE NA

I have tried using inner_join, but don't work and looks like this

v1  v2.x v2.y v3.x v3.y v4 v5
RDA NA   COL  ILL  NA    NA P3
OPCS NA  BR   NA   ILL   NA P2

Any help would be much appreciated.


Solution

  • library(dplyr)
    
    new <- setdiff(names(df2), names(df1))
    merge_vars <- "v1"
    
    rows_patch(df1, select(df2, -all_of(new)), by = merge_vars) |>
      left_join(select(df2, all_of(c(merge_vars, new))), by = merge_vars)
    

    ?rows_patch only overwrites NA values. If you want to update any values you can use rows_update. Note, to use rows_patch all columns in df2 need to exist in df1, which is why I use setdiff to temporarily remove "v5".

    Output

        v1  v2  v3   v4 v5
    1  RDA COL ILL <NA> P3
    2 OPCS  BR ILL <NA> P2