I have a dataframe with 488 GPS points (long and lat). For each 488 points I would like to find their 2 closest neighbours.
So far I have created a point pattern object and computed the distance from the nearest two points (below). However, I would like to go a step further and be able to identify these nearest points by their ID from the original dataset.
Currently, my script works like:
# 1. store x and y coords in two vectors
lon <- data$longitude
lat <- data$latitude
# 2. create two vectors xrange and yrange with dimensions of triangle that contain all points
xrange <- range(lon, na.rm=T)
yrange <- range(lat, na.rm=T)
# 3. create ppp
lf <- ppp(lon, lat, xrange, yrange)
plot(lf)
nndist(lf, k = 1:2)
Giving me (example of top 5 results):
dist.1 dist.2
[1,] 1.426925e-03 0.0017007414
[2,] 1.017287e-03 0.0015574895
[3,] 6.502012e-04 0.0010172867
[4,] 6.502012e-04 0.0007202307
[5,] 7.202307e-04 0.0010472445
But I would like to be able to link this back to the "hhid" from the original dataset to something like this:
hhid dist.1 dist.1.hhid dist.2 dist.1.hhid
1 1.426925e-03 7 0.0017007414 3
2 1.017287e-03 6 0.0015574895 4
3 6.502012e-04 10 0.0010172867 5
4 6.502012e-04 2 0.0007202307 8
5 7.202307e-04 1 0.0010472445 13
First 20 rows of original dataset :
structure(list(hhid = c(2004L, 2006L, 2009L, 2012L, 2013L, 2020L,
2022L, 2023L, 2028L, 2029L, 2035L, 2036L, 2043L, 2046L, 2047L,
2059L, 2062L, 2063L, 2065L, 2066L), longitude = c(-1.478302479,
-1.477469802, -1.476488709, -1.476146936, -1.47547996, -1.475799441,
-1.475903392, -1.476232767, -1.476053953, -1.477196693, -1.476906657,
-1.478778243, -1.480723381, -1.433436394, -1.433033824, -1.428791046,
-1.431989908, -1.432058454, -1.43134892, -1.430848002), latitude = c(12.10552216,
12.10700512, 12.10673618, 12.10618305, 12.10645485, 12.10846806,
12.1080761, 12.10830975, 12.11114883, 12.11076546, 12.11197853,
12.11345387, 12.10725021, 12.1183548, 12.11699867, 12.11466122,
12.1154108, 12.11545277, 12.11554337, 12.11567497)), row.names = c(NA,
20L), class = "data.frame")
This seems to be good an extension of the question posed here. Building off of that question's accepted answer to extend to your specific situation examining the closest two neighbors, you could do:
library(sp)
library(rgeos)
# dput structure in question assigned as "df"
spatialDF <- df
coordinates(spatialDF) <- ~longitude + latitude
dists <- gDistance(spatialDF, byid = TRUE)
min.2dists <- apply(dists, 1, function(x) order(x, decreasing = FALSE)[2:3])
# closest
df$hhid1 <- df[min.2dists[1,],"hhid"]
df$dist1 <- apply(dists, 1, function(x) sort(x, decreasing = FALSE)[2])
# second closest
df$hhid2 <- df[min.2dists[2,],"hhid"]
df$dist2 <- apply(dists, 1, function(x) sort(x, decreasing = FALSE)[3])
Output:
# hhid longitude latitude hhid1 dist1 hhid2 dist2
# 1 2004 -1.478302 12.10552 2006 1.700741e-03 2009 0.0021825687
# 2 2006 -1.477470 12.10701 2009 1.017287e-03 2012 0.0015574895
# 3 2009 -1.476489 12.10674 2012 6.502012e-04 2006 0.0010172867
# 4 2012 -1.476147 12.10618 2009 6.502012e-04 2013 0.0007202307
# 5 2013 -1.475480 12.10645 2012 7.202307e-04 2009 0.0010472445
# ...