r algorithm shapefile nearest-neighbor point-in-polygon

Color every point in a polygon depending on another dataset of points, in R

Problem:

1.) I have a shapefile that looks like this:

Shapefile

Extreme values for coordinates are: xmin = 300,000, xmax = 620,000, ymin = 31,000 and ymax = 190,000.

2.) I have a dataset of approx. 2mio points (every point is inside the given polygon) - each one is in one of a 5 different categories.

Now, for every point inside the border (distance between points has to be 10, so that would give us 580,800,000 points) I want to determine color, depending on a category of the nearest point in a dataset.

In the end I would like to draw a ggplot, where the color of every point is dependent on its category (so I'll use 5 different colors).

What I have so far:

My ideas for solution are not optimized and it takes R forever to determine categories for every point inside the polygon.

1.) I created a new dataset with points in a shape of a rectangle with extreme values of coordinates, with 10 units between points. From a new dataset I selected points that have fallen inside the border of polygons (with a function pnt.in.poly from package SDMTools). Then I wanted to find nearest points (from dataset) of every point in a polygon and determined category, but I never manage to get a subset from 580,800,000 points (obviously).

2.) I tried to take 2mio points and color an area around them, dependent on their category, but that did not work right.

I know that it is not possible to plot so many points and see the difference between plot with 200,000,000 points and plot with 1,000,000 points, but I would like to have an accurate coloring when zooming (drawing) only one little spot in a polygon (size of 100 x 100 for example).

Question: Is there any better a way of coloring so many points in a polygon (with creating a new shapefile or grouping points)?

Thank you for your ideas!

Solution

It’s really helpful if you include some data with your question, even (especially) if it’s a toy data set. As you don’t, I’ve made a toy example. First, I define a simple shape data frame and a data frame of synthetic data that includes x, y, and grp (i.e., a categorical variable with 5 levels). I crop the latter to the former and plot the results,

# Dummy shape function
df_shape <- data.frame(x = c(0, 0.5, 1, 0.5, 0),
                    y = c(0, 0.2, 1, 0.8, 0))

# Load library
library(ggplot2)
library(sgeostat) # For in.polygon function

# Data frame of synthetic data: random [x, y] and category (grp)
df_synth <- data.frame(x = runif(500),
                       y = runif(500),
                       grp = factor(sample(1:5, 500, replace = TRUE)))

# Remove points outside polygon
df_synth <- df_synth[in.polygon(df_synth$x, df_synth$y, df_shape$x, df_shape$y), ]

# Plot shape and synthetic data
g <- ggplot(df_shape, aes(x = x, y = y)) + geom_path(colour = "#FF3300", size = 1.5)
g <- g + ggthemes::theme_clean()
g <- g + geom_point(data = df_synth, aes(x = x, y = y, colour = grp))
g

Next, I create a regular grid and crop that using the polygon.

# Create a grid
df_grid <- expand.grid(x = seq(0, 1, length.out = 50),
                       y = seq(0, 1, length.out = 50))

# Check if grid points are in polygon
df_grid <- df_grid[in.polygon(df_grid$x, df_grid$y, df_shape$x, df_shape$y), ]

# Plot shape and show points are inside
g <- ggplot(df_shape, aes(x = x, y = y)) + geom_path(colour = "#FF3300", size = 1.5)
g <- g + ggthemes::theme_clean()
g <- g + geom_point(data = df_grid, aes(x = x, y = y))
g

To classify each point on this grid by the nearest point in the synthetic data set, I use knn or k-nearest-neighbours with k = 1. That gives something like this.

# Classify grid points according to synthetic data set using k-nearest neighbour
df_grid$grp <- class::knn(df_synth[, 1:2], df_grid, df_synth[, 3])

# Show categorised points
g <- ggplot()
g <- g + ggthemes::theme_clean()
g <- g + geom_point(data = df_grid, aes(x = x, y = y, colour = grp))
g

So, that's how I'd address that part of your question about classifying points on a grid.

The other part of your question seems to be about resolution. If I understand correctly, you want the same resolution even if you're zoomed in. Also, you don't want to plot so many points when zoomed out, as you can't even see them. Here, I create a plotting function that lets you specify the resolution. First, I plot all the points in the shape with 50 points in each direction. Then, I plot a subregion (i.e., zoom), but keep the same number of points in each direction the same so that it looks pretty much the same as the previous plot in terms of numbers of dots.

res_plot <- function(xlim, xn, ylim, yn, df_data, df_sh){
  # Create a grid
  df_gr <- expand.grid(x = seq(xlim[1], xlim[2], length.out = xn),
                         y = seq(ylim[1], ylim[2], length.out = yn))

  # Check if grid points are in polygon
  df_gr <- df_gr[in.polygon(df_gr$x, df_gr$y, df_sh$x, df_sh$y), ]

  # Classify grid points according to synthetic data set using k-nearest neighbour
  df_gr$grp <- class::knn(df_data[, 1:2], df_gr, df_data[, 3])

  g <- ggplot()
  g <- g + ggthemes::theme_clean()
  g <- g + geom_point(data = df_gr, aes(x = x, y = y, colour = grp))
  g <- g + xlim(xlim) + ylim(ylim)
  g
}

# Example plot
res_plot(c(0, 1), 50, c(0, 1), 50, df_synth, df_shape)

# Same resolution, but different limits
res_plot(c(0.25, 0.75), 50, c(0, 1), 50, df_synth, df_shape)

^{Created on 2019-05-31 by the reprex package (v0.3.0)}

Hopefully, that addresses your question.