I have a dataframe 'df1' that looks like:
structure(list(MAPS_code = c("SARI", "SABO", "SABO", "SABO",
"ISLA", "TROP"), Location_code = c("LCP-", "LCP-", "LCP-", "LCP-", "LCP-",
"LCP-"), Contact = c("Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall",
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall"), Lat = c(NA, NA, NA,
NA, NA, "51.23"), Long = c(NA, NA, NA, NA, NA, "-109.26")), row.names = c(NA, 6L), class = "data.frame")
And a second dataframe 'df2' that looks like:
structure(list(MAPS_code = c("SAFR", "SAGA", "ELPU", "ISLA",
"SABO", "SATE", "QUST", "SARI", "PANA", "COPA", "LOAN", "GAPA",
"MELI", "CAGO", "PINO", "GABO", "RIJA", "FILA", "AMIS"), Lat = c(8.765833,
8.751389, 8.768611, 8.835833, 8.801111, 8.808333, 8.815, 8.827778,
8.781667, 8.778333, 8.783333, 8.800833, 8.790278, 8.754444, 8.844444,
8.801389, 8.786667, 8.785278, 8.952222), Long = c(-82.94277,
-82.951111, -82.95, -82.963056, -82.917222, -82.924444, -82.923889,
-82.924167, -82.896944, -82.955833, -82.938611, -82.972222, -82.967222,
-82.925833, -82.97, -82.972222, -82.964722, -82.976111, -82.833333
), Contact = c("Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall",
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall",
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall",
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall",
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall"
), Location = c("LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-",
"LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-", "LCP-",
"LCP-", "LCP-", "LCP-", "LCP-", "LCP-")), class = "data.frame", row.names = c(NA,
-19L))
How can I make populate each row of 'Lat' and 'Long' of df1 from 'Lat' and 'Long' of df2 when 'Contact', 'Location', and 'MAPS_code' of the corresponding rows match between df1 and df2? So that the result of df1 looks like:
structure(list(MAPS_code = c("SARI", "SABO", "SABO", "SABO",
"ISLA", "TROP"), Location_code = c("LCP-", "LCP-", "LCP-", "LCP-", "LCP-",
"LCP-"), Contact = c("Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall",
"Chase Mendenhall", "Chase Mendenhall", "Chase Mendenhall"), Lat = c("8.827778", "8.801111", "8.801111
", "8.801111", "8.835833", "51.23"), Long = c("-82.92417", "-82.91722", "-82.91722", "-82.91722", "-82.96306", "-109.26")), row.names = c(NA, 6L), class = "data.frame")
Note, if there are already data in Lat and Long, I don't want them deleted or written over with NAs.
Updated Answer
We could use dplyr::coalesce
to retrieve the values which is not NA
among the two pairs of Lat
and Long
:
library(dplyr)
df1 %>%
rename(Location = Location_code) %>%
left_join(df2, by = c('MAPS_code', 'Contact', 'Location')) %>%
mutate(across(ends_with('.x'), as.double)) %>%
mutate(Lat = coalesce(!!!(select(., starts_with('Lat')))),
Long = coalesce(!!!select(., starts_with('Long')))) %>%
select(!contains('.'))
MAPS_code Location Contact Lat Long
1 SARI LCP- Chase Mendenhall 8.827778 -82.92417
2 SABO LCP- Chase Mendenhall 8.801111 -82.91722
3 SABO LCP- Chase Mendenhall 8.801111 -82.91722
4 SABO LCP- Chase Mendenhall 8.801111 -82.91722
5 ISLA LCP- Chase Mendenhall 8.835833 -82.96306
6 TROP LCP- Tom Jones 51.230000 -109.26000