Search code examples
rmergegeofull-join

Join two datasets based on latitude and distance


I have two datasets that I am trying to merge together based on the latitude. The latitudes do not exactly match so it has to be closest latitude by say 30km. When I try full join, the output data frame is sequential (i.e., it spits out df then df2 rather than merging/joining them. Is there a way I can merge these two data frames into one based on how close they are in latitude?

Var1 <- c(1, 4, 6, 2, 8, 9, 2, 4, 5, 4)
Var2 <- c(1.9, 2.4, 3.7, 7.3, 2.4, 4.8, 9, 2.3, 9, 1.7)
Lat <- c(57.33, 58.21, 58.93, 59.23, 59.87, 60.29, 60.99, 61.5, 61.8, 62.5)
Lon <- c(131, 131, 131, 130.9, 130.99, 130.5, 131, 131, 131.2, 131)

df <- as.data.frame(cbind(Var1, Var2, Lat, Lon))

Var1 <- c(4, 3, 6, 8, 1, 9, 3, 6, 5, 8)
Var2 <- c(6.2, 3, 5.6, 5.1, 9.0, 2.4, 4.9, 8.2, 3.1, 6.2)
Lat <- c(57.15, 58.3, 59.0, 59.4, 60.0, 60.1, 61.0, 61.6, 62, 62.7)
Lon <- c(131, 131, 131, 130.9, 130.99, 130.5, 131, 131, 131.2, 131)

df2 <- as.data.frame(cbind(Var1, Var2, Lat, Lon))

Vic = df%>%
  full_join(df2, by = c("Lat" = "Lat"))

I also tried geo_join but I get an error

merged.dfs <- geo_join(df, df2, by = NULL, method = "haversine", mode = "left", max_dist = 1) 

Joining by: c("Var1", "Var2", "Lat", "Lon")
Error in FUN(X[[i]], ...) : 
Trying to join on Var1, Var2, Lat, Lon; geo_join needs exactly   two columns (latitude and longitude)

Solution

  • The fuzzyjoin package and the geo_join function is definitely the way to go. You were close with your try, you just need to set up some parameters as the error mentioned.

    library(fuzzyjoin)
    Var1 <- c(1, 4, 6, 2, 8, 9, 2, 4, 5, 4)
    Var2 <- c(1.9, 2.4, 3.7, 7.3, 2.4, 4.8, 9, 2.3, 9, 1.7)
    Lat <- c(57.33, 58.21, 58.93, 59.23, 59.87, 60.29, 60.99, 61.5, 61.8, 62.5)
    Lon <- c(131, 131, 131, 130.9, 130.99, 130.5, 131, 131, 131.2, 131)
    
    df <- as.data.frame(cbind(Var1, Var2, Lat, Lon))
    
    Var1 <- c(4, 3, 6, 8, 1, 9, 3, 6, 5, 8)
    Var2 <- c(6.2, 3, 5.6, 5.1, 9.0, 2.4, 4.9, 8.2, 3.1, 6.2)
    Lat <- c(57.15, 58.3, 59.0, 59.4, 60.0, 60.1, 61.0, 61.6, 62, 62.7)
    Lon <- c(131, 131, 131, 130.9, 130.99, 130.5, 131, 131, 131.2, 131)
    
    df2 <- as.data.frame(cbind(Var1, Var2, Lat, Lon))
    
    Vic <-  df%>%
      geo_join(df2, by = c("Lat" = "Lat", "Lon" = "Lon"), method = "haversine", mode = "left", max_dist = 30, unit = "km" )
    

    Look into this documentation if needed