I have a two data frames:
points
contains a series of points with x, y
coordinates.poly
contains coordinates of two polygons (I have over 100 in reality, but keeping it simple here).I want to be able to add to the dataframe points
an extra column called Area
which contains the name of the polygon the point is in.
poly <- data.frame(
pol= c("P1", "P1","P1","P1","P1","P2","P2","P2","P2", "P2"),
x=c(4360, 7273, 7759, 4440, 4360, 8720,11959, 11440,8200, 8720),
y=c(1009, 9900,28559,28430,1009,9870,9740,28500,28040,9870))
points <- data.frame(
object = c("P1", "P1","P1","P2","P2","P2"),
timestamp= c(1485670023468,1485670023970, 1485670024565, 1485670025756,1485670045062, 1485670047366),
x=c(6000, 6000, 6050, 10000, 10300, 8000),
y=c(10000, 20000,2000,5000,20000,2000))
plot(poly$x, poly$y, type = 'l')
text(points$x, points$y, labels=points$object )
So essentially in this example the first 2 rows should have Area= "P1"
while the last point should be blank as the point is not contained in any polygon.
I have tried using the function in.out
but haven't been able to build my data frame as I described.
Any help is very appreciated!
Although this is using a for
loop, it is practically quite fast.
library(mgcv)
x <- split(poly$x, poly$pol)
y <- split(poly$y, poly$pol)
todo <- 1:nrow(points)
Area <- rep.int("", nrow(points))
pol <- names(x)
# loop through polygons
for (i in 1:length(x)) {
# the vertices of i-th polygon
bnd <- cbind(x[[i]], y[[i]])
# points to allocate
xy <- with(points, cbind(x[todo], y[todo]))
inbnd <- in.out(bnd, xy)
# allocation
Area[todo[inbnd]] <- pol[i]
# update 'todo'
todo <- todo[!inbnd]
}
points$Area <- Area
Two reasons for its efficiency:
for
loop is through the polygons, not points. So if you have 100 polygons and 100000 points to allocate, the loop only has 100 iterations not 100000. Inside each iteration, the vectorization power of C function in.out
is exploited;todo
variable controls the points to allocate through the loop. As it goes, the working set is reducing.