Search code examples
rjoindplyrtidyversetibble

Joining lat/lon data frames by nearest distance


Let's say I have a regular latitude/longitude grid and data at irregular locations, like this:

grid = tidyr::crossing(lon = seq(0, 1, 0.25), lat = seq(0, 1, 0.25))
data = tibble::tibble(lon = runif(4), lat=runif(4), y=rnorm(4))

How do I use, for example, dplyr::inner_join and join_by to join these data frames so that I get y values from data and corresponding lat and lon values from grid from the nearest location, i.e. the grid point with smallest (grid$lon - data$lon)^2 + (grid$lat - data$lat)^2 for each row in data?


Solution

  • The package sf is made to manipulate spatial geometries; ex. points, lines, polygones. You need to convert the dataframes as sf objects, then you can specify a spatial join st_join() with join = st_nearest_feature as argument.

    library(sf)
    library(tidyverse)
    
    set.seed(42)
    
    grid <- tidyr::crossing(lon = seq(0, 1, 0.25), lat = seq(0, 1, 0.25))
    data <- tibble::tibble(lon = runif(4), lat = runif(4), y = rnorm(4))
    
    grid_sf = st_as_sf(grid , coords =c("lon","lat"))
    data_sf = st_as_sf(data , coords =c("lon","lat"))
    
    joined = st_join(grid_sf, data_sf, join = st_nearest_feature)
    
    ggplot() + geom_sf(data= joined, aes(col = y))+ 
      geom_sf(data= data_sf, aes(col = y, fill = y),size= 4, shape = 22)
    

    Created on 2024-07-12 with reprex v2.1.0