Search code examples
rspatialr-sp

How to make a SpatialLines object from undefined and unknowon spatial sequence of points in R?


I have a lon/lat points, The problem is I don't have lines ID or sort id within the lines. So I want to make multi-lines from the points following criteria of the distance between points. For example in the image here , at least 4 lines should be created.

How can I achieve that?


Solution

  • I don’t think there is an easy general solution to this problem. Here is some inspiration using spatstat to get some of the way to the goal:

    library(spatstat)
    

    Generate test data:

    d1 <- data.frame(x=1:9, y=-4)
    d2 <- data.frame(x=1:9, y=0)
    d3 <- data.frame(x=1:9, y=4)
    X <- as.ppp(rbind(d1,d2,d3), W = owin(c(0,10), c(-5,5)))
    set.seed(42)
    X <- rjitter(X, 0.5)
    plot(X)
    

    Find connected components and within each component connect every point with the two closest neighbours:

    Xcomp <- connected.ppp(X, R = 2)
    Xcomp <- split(Xcomp)
    neighbours <- list()
    line_list <- list()
    for(i in seq_along(Xcomp)){
      pts <- Xcomp[[i]]
      nn <- nnwhich(pts, k=1:2)
      x0 <- c(pts$x, pts$x)
      y0 <- c(pts$y, pts$y)
      x1 <- c(pts$x[nn[,1]], pts$x[nn[,2]])
      y1 <- c(pts$y[nn[,1]], pts$y[nn[,2]])
      line_list[[i]] <- psp(x0, y0, x1, y1, window = Window(X))
    }
    

    Put components back together and convert to a linear network (linnet) which is basically an undirected graph where the nodes have explicit location in Euclidean space rather than being abstract.

    L <- Reduce(superimpose.psp, line_list)
    L <- as.linnet(L)
    #> Warning: Network is not connected
    plot(L)
    

    The remaining task is to find all triangles and delete the longest edge which is more fiddly. You can use edges2triangles to find all triangles:

    tri <- edges2triangles(L$from, L$to)
    tri
    #>       i  j  k
    #> [1,]  1  2  3
    #> [2,]  4  5  6
    #> [3,]  7  8  9
    #> [4,] 10 11 12
    #> [5,] 13 14 15
    #> [6,] 16 17 18
    #> [7,] 19 20 21
    #> [8,] 25 26 27
    

    So e.g. vertices 25,26,27 form a triangle

    i <- as.numeric(tri[8,])
    Li <- thinNetwork(L, retainvertices = i)
    plot(Li)
    

    The triangle has three edges from i to j:

    j <- i[c(2,3,1)]
    i
    #> [1] 25 26 27
    j
    #> [1] 26 27 25
    

    Distrance matrix between all vertices (overkill, but easily calculated and should only be done once – avoid for big datasets)

    D <- pairdist(vertices(L))
    

    Index of longest distance:

    long <- which.max(diag(D[i,j]))
    long
    #> [1] 1
    

    Thus the edge from i[long] to j[long] should be deleted

    plot(L)
    edge <- which(paste(L$from,L$to)==paste(sort(c(i[long],j[long])), collapse = " "))
    plot(thinNetwork(L, retainedges = edge), add = TRUE, col = 2, lwd = 1.5)
    

    We should apply this code to all triangles in a loop:

    edge <- numeric(nrow(tri))
    for(k in seq_len(nrow(tri))){
      i <- tri[k,]
      j <- i[c(2,3,1)]
      long <- which.max(diag(D[i,j]))
      edge[k] <- which(paste(L$from,L$to)==paste(sort(c(i[long],j[long])), collapse = " "))
    }
    Lfinal <- thinNetwork(L, retainedges = -edge)
    plot(Lfinal)
    

    If the lines are needed separately we can use connected:

    Lfinal_list <- connected.linnet(Lfinal, what = "components")
    Lfinal_list
    #> [[1]]
    #> Linear network with 9 vertices and 8 lines
    #> Enclosing window: rectangle = [0, 10] x [-5, 5] units
    #> 
    #> [[2]]
    #> Linear network with 9 vertices and 8 lines
    #> Enclosing window: rectangle = [0, 10] x [-5, 5] units
    #> 
    #> [[3]]
    #> Linear network with 9 vertices and 8 lines
    #> Enclosing window: rectangle = [0, 10] x [-5, 5] units
    

    Finding and deleting triangles could easily be done for each component when constructing the lines rather than at the end when all lines are collected. That would be much more efficint on large datasets, but this works nicely as a proof of concept. Beware of hacks such as the pasting trick above to find the edge number – this may not be very robust and I’m not sure whether it works in all cases.