I have a lon/lat points, The problem is I don't have lines ID or sort id within the lines. So I want to make multi-lines from the points following criteria of the distance between points. For example in the image here , at least 4 lines should be created.
How can I achieve that?
I don’t think there is an easy general solution to this problem. Here is
some inspiration using spatstat
to get some of the way to the goal:
library(spatstat)
Generate test data:
d1 <- data.frame(x=1:9, y=-4)
d2 <- data.frame(x=1:9, y=0)
d3 <- data.frame(x=1:9, y=4)
X <- as.ppp(rbind(d1,d2,d3), W = owin(c(0,10), c(-5,5)))
set.seed(42)
X <- rjitter(X, 0.5)
plot(X)
Find connected components and within each component connect every point with the two closest neighbours:
Xcomp <- connected.ppp(X, R = 2)
Xcomp <- split(Xcomp)
neighbours <- list()
line_list <- list()
for(i in seq_along(Xcomp)){
pts <- Xcomp[[i]]
nn <- nnwhich(pts, k=1:2)
x0 <- c(pts$x, pts$x)
y0 <- c(pts$y, pts$y)
x1 <- c(pts$x[nn[,1]], pts$x[nn[,2]])
y1 <- c(pts$y[nn[,1]], pts$y[nn[,2]])
line_list[[i]] <- psp(x0, y0, x1, y1, window = Window(X))
}
Put components back together and convert to a linear network (linnet
) which
is basically an undirected graph where the nodes have explicit location in
Euclidean space rather than being abstract.
L <- Reduce(superimpose.psp, line_list)
L <- as.linnet(L)
#> Warning: Network is not connected
plot(L)
The remaining task is to find all triangles and delete the longest edge
which is more fiddly. You can use edges2triangles
to find all triangles:
tri <- edges2triangles(L$from, L$to)
tri
#> i j k
#> [1,] 1 2 3
#> [2,] 4 5 6
#> [3,] 7 8 9
#> [4,] 10 11 12
#> [5,] 13 14 15
#> [6,] 16 17 18
#> [7,] 19 20 21
#> [8,] 25 26 27
So e.g. vertices 25,26,27 form a triangle
i <- as.numeric(tri[8,])
Li <- thinNetwork(L, retainvertices = i)
plot(Li)
The triangle has three edges from i to j:
j <- i[c(2,3,1)]
i
#> [1] 25 26 27
j
#> [1] 26 27 25
Distrance matrix between all vertices (overkill, but easily calculated and should only be done once – avoid for big datasets)
D <- pairdist(vertices(L))
Index of longest distance:
long <- which.max(diag(D[i,j]))
long
#> [1] 1
Thus the edge from i[long]
to j[long]
should be deleted
plot(L)
edge <- which(paste(L$from,L$to)==paste(sort(c(i[long],j[long])), collapse = " "))
plot(thinNetwork(L, retainedges = edge), add = TRUE, col = 2, lwd = 1.5)
We should apply this code to all triangles in a loop:
edge <- numeric(nrow(tri))
for(k in seq_len(nrow(tri))){
i <- tri[k,]
j <- i[c(2,3,1)]
long <- which.max(diag(D[i,j]))
edge[k] <- which(paste(L$from,L$to)==paste(sort(c(i[long],j[long])), collapse = " "))
}
Lfinal <- thinNetwork(L, retainedges = -edge)
plot(Lfinal)
If the lines are needed separately we can use connected
:
Lfinal_list <- connected.linnet(Lfinal, what = "components")
Lfinal_list
#> [[1]]
#> Linear network with 9 vertices and 8 lines
#> Enclosing window: rectangle = [0, 10] x [-5, 5] units
#>
#> [[2]]
#> Linear network with 9 vertices and 8 lines
#> Enclosing window: rectangle = [0, 10] x [-5, 5] units
#>
#> [[3]]
#> Linear network with 9 vertices and 8 lines
#> Enclosing window: rectangle = [0, 10] x [-5, 5] units
Finding and deleting triangles could easily be done for each component when constructing the lines rather than at the end when all lines are collected. That would be much more efficint on large datasets, but this works nicely as a proof of concept. Beware of hacks such as the pasting trick above to find the edge number – this may not be very robust and I’m not sure whether it works in all cases.