Search code examples
rgisspatstat

R Spatstat: Identify nearest neighbours for further use


I have a dataframe with 488 GPS points (long and lat). For each 488 points I would like to find their 2 closest neighbours.

So far I have created a point pattern object and computed the distance from the nearest two points (below). However, I would like to go a step further and be able to identify these nearest points by their ID from the original dataset.

Currently, my script works like:

# 1. store x and y coords in two vectors
lon <- data$longitude
lat <- data$latitude

# 2. create two vectors xrange and yrange with dimensions of triangle that contain all points
xrange <- range(lon, na.rm=T)
yrange <- range(lat, na.rm=T)

# 3. create ppp
lf <- ppp(lon, lat, xrange, yrange)

plot(lf)

nndist(lf, k = 1:2)

Giving me (example of top 5 results):

             dist.1       dist.2
  [1,] 1.426925e-03 0.0017007414
  [2,] 1.017287e-03 0.0015574895
  [3,] 6.502012e-04 0.0010172867
  [4,] 6.502012e-04 0.0007202307
  [5,] 7.202307e-04 0.0010472445
 

But I would like to be able to link this back to the "hhid" from the original dataset to something like this:

  hhid         dist.1  dist.1.hhid         dist.2     dist.1.hhid
  1    1.426925e-03             7  0.0017007414                 3
  2    1.017287e-03             6  0.0015574895                 4
  3    6.502012e-04            10  0.0010172867                 5
  4    6.502012e-04             2  0.0007202307                 8
  5    7.202307e-04             1  0.0010472445                13

First 20 rows of original dataset :

structure(list(hhid = c(2004L, 2006L, 2009L, 2012L, 2013L, 2020L, 
2022L, 2023L, 2028L, 2029L, 2035L, 2036L, 2043L, 2046L, 2047L, 
2059L, 2062L, 2063L, 2065L, 2066L), longitude = c(-1.478302479, 
-1.477469802, -1.476488709, -1.476146936, -1.47547996, -1.475799441, 
-1.475903392, -1.476232767, -1.476053953, -1.477196693, -1.476906657, 
-1.478778243, -1.480723381, -1.433436394, -1.433033824, -1.428791046, 
-1.431989908, -1.432058454, -1.43134892, -1.430848002), latitude = c(12.10552216, 
12.10700512, 12.10673618, 12.10618305, 12.10645485, 12.10846806, 
12.1080761, 12.10830975, 12.11114883, 12.11076546, 12.11197853, 
12.11345387, 12.10725021, 12.1183548, 12.11699867, 12.11466122, 
12.1154108, 12.11545277, 12.11554337, 12.11567497)), row.names = c(NA, 
20L), class = "data.frame")

Solution

  • This seems to be good an extension of the question posed here. Building off of that question's accepted answer to extend to your specific situation examining the closest two neighbors, you could do:

    library(sp)
    library(rgeos)
    # dput structure in question assigned as "df"
    
    spatialDF <- df
    coordinates(spatialDF) <- ~longitude + latitude
    dists <- gDistance(spatialDF, byid = TRUE)
    min.2dists <- apply(dists, 1, function(x) order(x, decreasing = FALSE)[2:3])
    
    # closest
    df$hhid1 <- df[min.2dists[1,],"hhid"]
    df$dist1 <- apply(dists, 1, function(x) sort(x, decreasing = FALSE)[2])
    
    # second closest
    df$hhid2 <- df[min.2dists[2,],"hhid"]
    df$dist2 <- apply(dists, 1, function(x) sort(x, decreasing = FALSE)[3])
    

    Output:

    #    hhid longitude latitude hhid1        dist1 hhid2        dist2
    # 1  2004 -1.478302 12.10552  2006 1.700741e-03  2009 0.0021825687
    # 2  2006 -1.477470 12.10701  2009 1.017287e-03  2012 0.0015574895
    # 3  2009 -1.476489 12.10674  2012 6.502012e-04  2006 0.0010172867
    # 4  2012 -1.476147 12.10618  2009 6.502012e-04  2013 0.0007202307
    # 5  2013 -1.475480 12.10645  2012 7.202307e-04  2009 0.0010472445
    # ...