Search code examples
rdplyrmatchleft-join

R Multiple Dataframe Column Matches to Populate Column


I have a dataframe 'df1' that looks like:

structure(list(MAPS_code = c("SARI", "SABO", "SABO", "SABO", 
"ISLA", "TROP"), Location_code = c("LCP-", "LCP-", "LCP-", "LCP-", "LCP-",
"LCP-"), Contact = c("Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", 
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall"), Lat = c(NA, NA, NA, 
NA, NA, "51.23"), Long = c(NA, NA, NA, NA, NA, "-109.26")), row.names = c(NA, 6L), class = "data.frame")

And a second dataframe 'df2' that looks like:

structure(list(MAPS_code = c("SAFR", "SAGA", "ELPU", "ISLA", 
"SABO", "SATE", "QUST", "SARI", "PANA", "COPA", "LOAN", "GAPA", 
"MELI", "CAGO", "PINO", "GABO", "RIJA", "FILA", "AMIS"), Lat = c(8.765833, 
8.751389, 8.768611, 8.835833, 8.801111, 8.808333, 8.815, 8.827778, 
8.781667, 8.778333, 8.783333, 8.800833, 8.790278, 8.754444, 8.844444, 
8.801389, 8.786667, 8.785278, 8.952222), Long = c(-82.94277, 
-82.951111, -82.95, -82.963056, -82.917222, -82.924444, -82.923889, 
-82.924167, -82.896944, -82.955833, -82.938611, -82.972222, -82.967222, 
-82.925833, -82.97, -82.972222, -82.964722, -82.976111, -82.833333
), Contact = c("Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", 
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", 
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", 
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", 
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall"
), Location = c("LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-", 
"LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-", 
"LCP-", "LCP-", "LCP-", "LCP-", "LCP-")), class = "data.frame", row.names = c(NA, 
-19L))

How can I make populate each row of 'Lat' and 'Long' of df1 from 'Lat' and 'Long' of df2 when 'Contact', 'Location', and 'MAPS_code' of the corresponding rows match between df1 and df2? So that the result of df1 looks like:

structure(list(MAPS_code = c("SARI", "SABO", "SABO", "SABO", 
"ISLA", "TROP"), Location_code = c("LCP-", "LCP-", "LCP-", "LCP-", "LCP-", 
"LCP-"), Contact = c("Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", 
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall"), Lat = c("8.827778", "8.801111", "8.801111
", "8.801111", "8.835833", "51.23"), Long = c("-82.92417", "-82.91722", "-82.91722", "-82.91722", "-82.96306", "-109.26")), row.names = c(NA, 6L), class = "data.frame")

Note, if there are already data in Lat and Long, I don't want them deleted or written over with NAs.


Solution

  • Updated Answer We could use dplyr::coalesce to retrieve the values which is not NA among the two pairs of Lat and Long:

    library(dplyr)
    
    df1 %>%
      rename(Location = Location_code) %>%
      left_join(df2, by = c('MAPS_code', 'Contact', 'Location')) %>%
      mutate(across(ends_with('.x'), as.double)) %>%
      mutate(Lat = coalesce(!!!(select(., starts_with('Lat')))), 
             Long = coalesce(!!!select(., starts_with('Long')))) %>%
      select(!contains('.'))
    
    
      MAPS_code Location          Contact       Lat       Long
    1      SARI     LCP- Chase Mendenhall  8.827778  -82.92417
    2      SABO     LCP- Chase Mendenhall  8.801111  -82.91722
    3      SABO     LCP- Chase Mendenhall  8.801111  -82.91722
    4      SABO     LCP- Chase Mendenhall  8.801111  -82.91722
    5      ISLA     LCP- Chase Mendenhall  8.835833  -82.96306
    6      TROP     LCP-        Tom Jones 51.230000 -109.26000