Search code examples
rshapefilegeorgdal

Measuring metrics of proximity (min, max, mean) of distance between polygons in the same shapefile


I am looking for a way to measure minimum, maximum, and average distance between the edges of polygons within the same planar projected shapefile (UTMs) using Program R. Is there a package out there that has these capabilities? So far I am coming up short, with the package "rgeos" being the closest with "gDistance", however this seems to only return the minimum distance between polygons. This is to include several measures of dispersion for polygons. As an example I have here the islands from the state of Hawaii. You can download the US states shapefile here. I then isolate and project the state of Hawaii:

library(rgdal)
library(rgeos)

states = readOGR(dsn=path.expand(),layer = "states")
hawaii = states[states$STATE_NAME == "Hawaii", ] 
hawaii = spTransform(hawaii, CRS("+proj=utm +zone=4 ellps=WGS84"))

I would like to measure the maximum distance between the edges of islands within the state of Hawaii, the minimum distance between island edges, and the mean distance between island edges. I have tried gDistance from rgeo, which should return minimum distance between 2 edges, but it currently returns zeros for me:

gDistance(hawaii)

Any ideas? Trying the following using the "byid" call just returns three 0s:

> gDistance(hawaii, hawaii, byid = TRUE)
  0
0 0

I hope to make this part of a for loop to assess proximity metrics for ~200 separate polygon shapefiles with multiple polygons in each file. I just need to calculate proximity within the polygons of each shapefile, not across different shapefiles. Thank you for the help.


Solution

  • Firstly, for the minimum, generally if you run gDistance(hawaii, byID = TRUE), you will get a matrix showing the minimum distance between every pair of islands(polygons) in the dataset. However, as discussed in the comments, for this to work, each island needs to have its own ID in the polygon file.

    For Hawaii and the shapefile you specified, this approach will work for getting the minimum distance:

    library(sp)    
    hawaii_out <- disaggregate(hawaii)
    gDistance(hawaii_out,byid = T)
                  1         2         3         4         5         6         7
        1      0.00  26246.85 189520.49 299489.75 333273.01 367584.38 475015.98
        2  26246.85      0.00 117413.58 228699.22 263368.91 296349.18 406123.52
        3 189520.49 117413.58      0.00  41995.90  76905.51 110099.62 219964.68
        4 299489.75 228699.22  41995.90      0.00  13568.15  12738.74 129211.73
        5 333273.01 263368.91  76905.51  13568.15      0.00  14052.47 115235.51
        6 367584.38 296349.18 110099.62  12738.74  14052.47      0.00  46840.79
        7 475015.98 406123.52 219964.68 129211.73 115235.51  46840.79      0.00
    

    (Of course you'll need to check which ID corresponds with which island).

    This runs reasonably quickly for Hawaii and that shapefile, but this function can easily take a very long time if the number of islands (polygons) is high.

    EDIT: Adding an approach that can identify and measure the most extreme points of an island pair

    library(leaflet)
    library(leaflet.extras)
    library(sp)
    library(tidyr)
    library(dplyr)
    
    start <- Sys.time()
    ##Need the original long/lat shapefile for this
    hawaii_ll = states[states$STATE_NAME == "Hawaii", ] 
    hawaii_out_ll <- disaggregate(hawaii_ll) ##Separate the islands
    
    ##Exact the original coordinates from the polygon file. Since we're dealing with straight line polygons, the furthest distance between any two islands should lie on a vertice.
    IslandCoords <- list()
    for(i in rownames(hawaii_out_ll@data)){
    IslandCoords <- hawaii_out_ll@polygons[[as.numeric(i)]] %>% 
      slot("Polygons") %>% 
      .[[1]] %>% 
      slot("coords") %>% 
      cbind.data.frame(.,"Island"=i,"coord_ID"=paste0(i,"_",1:nrow(.)),stringsAsFactors=F) %>% 
      bind_rows(IslandCoords,.)
    }
    colnames(IslandCoords)[1:2] <- c("longitude","latitude")
    
    ##Double for loop will calculate max distance for each pair of islands in the dataset
    all_res <- list() ##Initialise list for final results
    for (island1 in unique(IslandCoords$Island)){
      temp_res <- list() ##List for temp results
    for (island2 in unique(IslandCoords$Island)) {
      if(island1!=island2){   ##Avoid running the loop on the same island
    ##subset points to a single pair of islands
    IslandCoordsTemp <- IslandCoords[IslandCoords$Island==island1|
                                       IslandCoords$Island==island2,]  
    ## Derive the convex hull (outermost points) for the island pair
    IslandCoordsTemp <- IslandCoordsTemp[chull(IslandCoordsTemp[,1:2]),] 
    
    ##Calculate distance matrix between points, tidy it and order by distance
    IslandTemp_scores <- spDists(as.matrix(IslandCoordsTemp[,1:2]),longlat=T) %>% 
      data.frame("coord_ID_A"=IslandCoordsTemp$coord_ID,
                 "Island_A"=IslandCoordsTemp$Island,
                 .) %>% 
      gather("coord_ID_B","distance",3:ncol(.)) %>% 
      arrange(desc(distance))
    
    ##Next two lines are to check and filter the data to ensure the maximum distance picked out is actually between points on differing islands
    IslandTemp_scores$coord_ID_B <- IslandCoordsTemp$coord_ID[as.numeric(gsub("X","",IslandTemp_scores$coord_ID_B))] 
    IslandTemp_scores$Island_B <- IslandCoordsTemp$Island[match(IslandTemp_scores$coord_ID_B,IslandCoordsTemp$coord_ID)]
    IslandTemp_scores <- IslandTemp_scores %>% 
      filter(IslandTemp_scores$Island_A != IslandTemp_scores$Island_B) %>% 
      head(1)
    
    ##Place results in temp list
    temp_res <- bind_rows(temp_res, 
                data.frame("Island1"=island1, 
               "Island2"=island2,
               "distance"=IslandTemp_scores$distance,
               stringsAsFactors = F))
    
    ##Use this to make sure the code is running as expected
    print(paste(island1,island2))
    }
    }
      ##Bind all results into one data frame
      all_res <- bind_rows(all_res,temp_res)
     }
    
    ##Spread into matrix (if needed, just to match gDistance's appearance)
    all_res_spread <- all_res %>% spread(key = Island2,value = distance,fill = 0)
    

    Units are in km.

      Island1        1        2        3         4         5        6        7
    1       1   0.0000 104.1285 272.4133 372.96831 374.27478 457.4984 624.7161
    2       2 104.1285   0.0000 235.0730 334.42077 338.90971 420.2209 592.3716
    3       3 272.4133 235.0730   0.0000 168.24874 174.68062 254.1973 430.2157
    4       4 372.9683 334.4208 168.2487   0.00000  65.76585 143.4336 319.7396
    5       5 374.2748 338.9097 174.6806  65.76585   0.00000 112.0591 283.6706
    6       6 457.4984 420.2209 254.1973 143.43355 112.05911   0.0000 258.1099
    7       7 624.7161 592.3716 430.2157 319.73960 283.67057 258.1099   0.0000
    

    You can use leaflet and the addMeasures add-on from leaflet.extras to sense check the results.

    ##Can use this to sense-check/confirm the results
    leaflet(hawaii_out_ll) %>% addPolygons(label = row.names(hawaii_out_ll@data)) %>% 
      addProviderTiles(providers$CartoDB) %>% addMeasure(primaryLengthUnit = "metre") %>% 
      addMarkers(data=IslandCoordsTemp)
    

    Leaflet Measure