Search code examples
rdplyrleft-join

What is the correct way to join two dataframes using dplyr?


I'm a bit perplexed by this. Why does the code below not work as I expect? I want the codes in 'data' to replace the NA's in 'data2'. What am I doing wrong?

(I want to use dplyr)

library(dplyr)

data <- data.frame(
  Name = c("Alice", "Bob", "Charlie", "David"),
  Code = c(1234, 5678, 9012, 3456)
)

data2 <- data.frame(
  Name = c("Alice", "Bob", "Charlie", "David", "Alice", "Bob"),
  Code = c(1234, 5678, 9012, 3456, NA, NA)
)


left_join(data2, data, join_by(Name))

Solution

  • A join will not replace any missing values. But you can achieve your desired result using an additional mutate step with coalesce to replace the NAs:

    library(dplyr, warn=FALSE)
    
    left_join(data2, data, join_by(Name)) |>
      mutate(
        Code = coalesce(Code.x, Code.y),
        .keep = "unused"
      )
    #>      Name Code
    #> 1   Alice 1234
    #> 2     Bob 5678
    #> 3 Charlie 9012
    #> 4   David 3456
    #> 5   Alice 1234
    #> 6     Bob 5678