Search code examples
rloopsdistancelatitude-longitudegeosphere

Calculate Distance using Latitude and Longitude data in Different Data frames of different lengths with loop


I have 2 data frames of different lengths, each with a longitude and latitude coordinate. I would like to connect the two data frames by calculating the distance between the lat/long points.

For simplicity, Data frame A (starting point) has the following structure

ID     long      lat 
1 -89.92702 44.19367 
2 -89.92525 44.19654 
3 -89.92365 44.19756 
4 -89.91949 44.19848 
5 -89.91359 44.19818  

And Data frame B (end point) has a similar structure but shorter

ID      LAT       LON
1  43.06519 -87.91446
2  43.14490 -88.07172
3  43.08969 -87.91202

I would like to calculate the distance between each point such that I would end with a data frame, merged to A, that has the distances between A1 and B1, A1 and B2, A1 and B3. Furthermore, this should repeat for all values of A in A$ID with all values of B$ID

A$ID   B$ID
1      1
2      2
3      3
4
5

Prior to posting this, I consulted several Stack Overflow threads (including this one and This Medium post but I am not sure how to approach the looping, especially since the lists are of different lengths.

Thank you!


Solution

  • Here's a solution using two packages: sf and tidyverse. The first one is used to convert the data into simple features and calculate the distance; while, the second one is used to put the data in the desired format.

    library(tidyverse)
    library(sf)
    
    # Transform data into simple features
    sfA <- st_as_sf(A, coords = c("long","lat"))
    sfB <- st_as_sf(B, coords = c("LON","LAT"))
    
    # Calculate distance between all entries of sf1 and sf2
    distances <- st_distance(sfA, sfB, by_element = F)
    # Set colnames for distances matrix
    colnames(distances) <- paste0("B",1:3)
    
    # Put the results in the desired format
    # Transform distances matrix into a tibble
    as_tibble(distances) %>%
      # Get row names and add them as a column
      rownames_to_column() %>%
      # Set ID as the column name for the row numbers
      rename("ID" = "rowname") %>%
      # Transform ID to numeric
      mutate_at(vars(ID), as.numeric) %>%
      # Join with the original A data frame
      right_join(A, by = "ID") %>%
      # Change the order of columns
      select(ID, long, lat, everything()) %>%
      # Put data into long format
      pivot_longer(cols = starts_with("B"),
                   names_to = "B_ID",
                   names_pattern = "B(\\d)",
                   values_to = "distance")