Search code examples
rloopscombinationsdistanceshapefile

Faster way of finding distances between combinations of points


I have a dataframe of points in different groups. My actual dataframe is over a thousand lines long. For every combination of groups, I need to find the distance between each point in the combination with every other point. I sum the distances of each point. I have a solution, but it is slow, when I am dealing with say 63 combinations.

To illustrate my current solution, consider the example where I have only three groups. I sort them into all possible combinations i.e. Combination 1 only contains group 1, combination 4 contains group 1 and 2.... (reproducible data below)

I then transform my dataframe into a shapefile of points:

points <- points_csv %>%st_as_sf(coords = c('longitude', 'latitude'))

I then make a vector of the distinct combinations:

Combination_list = points$combination
Combination_list <- unique(Combination_list)

And use the following loop:

Density_total = data.frame()
for (b in Combination_list){

filtered <- filter(points, combination == b)

x <- filtered$geometry

for (t in filtered$geometry){
test_point <- filtered$geometry[t]
M <- st_distance(test_point,x)
M <- unclass(M)

D <- sum(M)

df1 <- data.frame(D)

Density_total <- rbind(Density_total,df1)
}}

Reproducible data:

structure(list(Name = c("Group1", "Group1", "Group2", "Group3", 
"Group1", "Group1", "Group2", "Group1", "Group1", "Group3", "Group2", 
"Group3", "Group1", "Group2", "Group3"), combination = c("Combination1", 
"Combination1", "Combination2", "Combination3", "Combination4", 
"Combination4", "Combination4", "Combination5", "Combination5", 
"Combination5", "Combination6", "Combination6", "Combination7", 
"Combination7", "Combination7"), latitude = c(0.1989, 0.1989, 
0.201, 0.201, 0.1989, 0.1989, 0.201, 0.1989, 0.1989, 0.201, 0.201, 
0.201, 0.1989, 0.201, 0.201), longitude = c(-0.001, -0.0015, 
-0.0015, -0.001, -0.001, -0.0015, -0.0015, -0.001, -0.0015, -0.001, 
-0.0015, -0.001, -0.0015, -0.0015, -0.001)), class = "data.frame", row.names = c(NA, 
-15L), spec = structure(list(cols = list(Name = structure(list(), class = 
c("collector_character", 
"collector")), combination = structure(list(), class = c("collector_character", 
"collector")), latitude = structure(list(), class = c("collector_double", 
"collector")), longitude = structure(list(), class = c("collector_double", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1), class = "col_spec"))

Desired output should look something like this:

Distance      X       Y    Combination
0.000500000 0.1989 -0.0010 Combination1
0.000500000 0.1989 -0.0015 Combination1
0.000000000 0.2010 -0.0015 Combination2
0.000000000 0.2010 -0.0010 Combination3
0.002658703 0.1989 -0.0010 Combination4
0.002600000 0.1989 -0.0015 Combination4
0.004258703 0.2010 -0.0015 Combination4
0.002600000 0.1989 -0.0010 Combination5
0.002658703 0.1989 -0.0015 Combination5
0.004258703 0.2010 -0.0010 Combination5
0.000500000 0.2010 -0.0015 Combination6
0.000500000 0.2010 -0.0010 Combination6
0.004758703 0.1989 -0.0010 Combination7
0.004758703 0.1989 -0.0015 Combination7
0.004758703 0.2010 -0.0015 Combination7
0.004758703 0.2010 -0.0010 Combination7

Solution

  • Assigning your data to a data.frame named points. Here is a dplyr way to do it. You can use full_join to generate all of the combinations, then calculate the distances. Takes less than a second on my machine with your sample data.

    library(dplyr)
    points %>% 
      full_join(points, by = c("combination" = "combination")) %>%
      mutate(distance = (longitude.x - longitude.y)^2 + (latitude.x - latitude.y)^2) %>%
      group_by(latitude.x, longitude.x, combination) %>%
      summarise(total = sum(distance)) %>%
      select(Distance = total, X = latitude.x, Y = longitude.x, combination) %>% 
      arrange(combination)
    `summarise()` regrouping output by 'latitude.x', 'longitude.x' (override with `.groups` argument)
    
    # A tibble: 15 x 4
    # Groups:   X, Y [4]
         Distance     X       Y combination 
            <dbl> <dbl>   <dbl> <chr>       
     1 0.00000025 0.199 -0.0015 Combination1
     2 0.00000025 0.199 -0.001  Combination1
     3 0          0.201 -0.0015 Combination2
     4 0          0.201 -0.001  Combination3
     5 0.00000466 0.199 -0.0015 Combination4
     6 0.00000491 0.199 -0.001  Combination4
     7 0.00000907 0.201 -0.0015 Combination4
     8 0.00000491 0.199 -0.0015 Combination5
     9 0.00000466 0.199 -0.001  Combination5
    10 0.00000907 0.201 -0.001  Combination5
    11 0.00000025 0.201 -0.0015 Combination6
    12 0.00000025 0.201 -0.001  Combination6
    13 0.00000907 0.199 -0.0015 Combination7
    14 0.00000466 0.201 -0.0015 Combination7
    15 0.00000491 0.201 -0.001  Combination7
    

    In this sample set, Combinations 2 and 3 have a total distance of 0 because there is only one point in them.