Search code examples
rpolygonmgcv

Assign polygon to data point in R dataframe


I have a two data frames:

  • points contains a series of points with x, y coordinates.
  • poly contains coordinates of two polygons (I have over 100 in reality, but keeping it simple here).

I want to be able to add to the dataframe points an extra column called Area which contains the name of the polygon the point is in.

poly <- data.frame(
pol= c("P1", "P1","P1","P1","P1","P2","P2","P2","P2", "P2"),
x=c(4360, 7273, 7759, 4440, 4360, 8720,11959, 11440,8200, 8720),
y=c(1009, 9900,28559,28430,1009,9870,9740,28500,28040,9870))

points <- data.frame(
       object = c("P1", "P1","P1","P2","P2","P2"),
       timestamp= c(1485670023468,1485670023970, 1485670024565, 1485670025756,1485670045062, 1485670047366),
       x=c(6000, 6000, 6050, 10000, 10300, 8000),
       y=c(10000, 20000,2000,5000,20000,2000))

plot(poly$x, poly$y, type = 'l')
text(points$x, points$y, labels=points$object )

So essentially in this example the first 2 rows should have Area= "P1" while the last point should be blank as the point is not contained in any polygon.

I have tried using the function in.out but haven't been able to build my data frame as I described.

Any help is very appreciated!


Solution

  • Although this is using a for loop, it is practically quite fast.

    library(mgcv)
    
    x <- split(poly$x, poly$pol)
    y <- split(poly$y, poly$pol)
    
    todo <- 1:nrow(points)
    Area <- rep.int("", nrow(points))
    pol <- names(x)
    
    # loop through polygons
    for (i in 1:length(x)) {
      # the vertices of i-th polygon
      bnd <- cbind(x[[i]], y[[i]])
      # points to allocate
      xy <- with(points, cbind(x[todo], y[todo]))
      inbnd <- in.out(bnd, xy)
      # allocation
      Area[todo[inbnd]] <- pol[i]
      # update 'todo'
      todo <- todo[!inbnd]
      }
    
    points$Area <- Area
    

    Two reasons for its efficiency:

    • for loop is through the polygons, not points. So if you have 100 polygons and 100000 points to allocate, the loop only has 100 iterations not 100000. Inside each iteration, the vectorization power of C function in.out is exploited;
    • It works in a progressive way. Once a point has been allocated, it will be excluded from allocation later. todo variable controls the points to allocate through the loop. As it goes, the working set is reducing.