Search code examples
rjoinreplacemergemissing-data

R - Merge/Join and only replace if missing (Priority?)


Is it possible to only merge data for values that are missing?

For example, say I have two datasets. D1 is my priority dataset, but I want to use information from D2 to fill in any missing data in D1. If D1 and D2 have conflicting values, then I want to keep the values in D1 and discard D2.

D1 <- data.frame(
  id=seq(1,3),
  x=c("cow",NA,"sheep"))

D2 <- data.frame(
  id=seq(1,3),
  x=c("cow","turtle","parrot"))

Ideally, the final dataset would look like this:

D3 <- data.frame(
  id=seq(1,3),
  x=c("cow","turtle","sheep"))

turtle would replace the NA, but parrot wouldn't replace sheep.


Solution

  • In base R, you may use match -

    inds <- is.na(D1$x)
    D1$x[inds] <- D2$x[match(D1$id[inds], D2$id)]
    D1
    
    #  id      x
    #1  1    cow
    #2  2 turtle
    #3  3  sheep