Search code examples
rgistemporal

Counting spatial points within a polygon according to a temporal window


Overview

Using R, I would like to count the number of points within a polygon according to a specific criterion (a temporal window).

I have the following data:

  1. Geo-located survey data that include the date of the survey interview. Thus, I am able to pinpoint exactly when and where each survey was conducted and to map them out across the United States.
  2. Geo-located data about political rallies across the United States. These also include the date.

Using QGIS, I created a set of circular 50 mile buffers around each survey respondent. My goal is to count the number of political rallies that falls within each respondent's "buffer" within a specific time frame preceding the interview. The 50 mile buffers created in QGIS retain all variables of the original data, including the date of the interview.

Data

Using QGIS, I created some mock shapefiles with dates and locations to aid in replication.

Approach

I am trying to use GISTools::poly.counts to count the number of rallies within different temporal windows (30 days, 90 days, etc.).

Generally, to count the number of points within a polygon, I would simply use:

count <- GISTools::poly.counts(rallies, buffer)

This gives me the total number of rallies that occur within each buffer, but doesn't allow me to specify temporal windows. For example, it would be great to develop a count of the number of rallies within a buffer for the 30 days preceding the survey interview as well as the 90 days preceding the interview.

Remember, each polygon within my buffer shapefile has a different date of interview.

Here's what I've tried, but it's not working:

buffer$count_30 <- GISTools::poly.counts(
    rallies[buffer$date - rallies$date > 0 & buffer$date - rallies$date <= 30], 
    buffer)

I get the following error:

Error in `[.data.frame`(x@data, i, j, ..., drop = FALSE) : 
  undefined columns selected
In addition: Warning messages:
1: In unclass(time1) - unclass(time2) :
  longer object length is not a multiple of shorter object length
2: In unclass(time1) - unclass(time2) :
  longer object length is not a multiple of shorter object length

What is the correct way to accomplish this?


Solution

  • Another answer using sf, but this time using spatial joins and dplyr for filtering etc.

    library(tidyverse)
    library(sf)
    
    
    rallies <- read_sf('Downloads/stack_ex_q/rallies.shp')
    # Here I don't use the supplied buffer, but make one according to the data
    #fifty_buff <- read_sf('Downloads/stack_ex_q/rallies.shp') 
    surveys <- read_sf('Downloads/stack_ex_q/surveys.shp')
    
    
    # Transform to a crs using meters as a distance & make date col a proper date
    rallies <- st_transform(rallies, crs = 2163) %>% 
      mutate(date = as.Date(date))
    surveys <- st_transform(surveys, crs = 2163) %>%
      mutate(date = as.Date(date))
    
    # make a buffer w/ 50 mile radius (80467 meters), not used but useful for visualization
    buffer_50mi <- st_buffer(surveys, dist = 80467)
    

    Plot the data for a quick visual check:

    library(mapview)
    mapview(rallies, col.regions = 'purple') + 
      mapview(surveys, col.regions = 'black') + 
      mapview(buffer_50mi, col.regions = 'green')
    

    Visual check of the data

    Join the data using st_is_within_distance, using 80467m = 50 miles.

    joined <- surveys %>%
      st_join(rallies, join = st_is_within_distance, 80467)
    
    head(joined)
    
    Simple feature collection with 6 features and 4 fields
    geometry type:  POINT
    dimension:      XY
    bbox:           xmin: 1350401 ymin: -556609 xmax: 1438586 ymax: -455743.1
    projected CRS:  NAD27 / US National Atlas Equal Area
    # A tibble: 6 x 5
       id.x date.x                geometry  id.y date.y    
      <dbl> <date>             <POINT [m]> <dbl> <date>    
    1     1 2020-04-26   (1350401 -556609)    16 2020-02-19
    2     1 2020-04-26   (1350401 -556609)    17 2020-05-12
    3     2 2020-03-27 (1438586 -455743.1)     7 2020-02-18
    4     2 2020-03-27 (1438586 -455743.1)    15 2020-07-01
    5     2 2020-03-27 (1438586 -455743.1)    15 2020-03-28
    6     3 2020-06-12 (1352585 -479940.5)    15 2020-07-01
    

    The .x columns are from the survey sf object & the .y columns are from the rallies sf object. Geometry is retained from the survey sf.

    Using dplyr's filter, group_by, and mutate, find what you're looking for. The code below counts rallies within 50 miles and +/- 60 days by survey point as an example.

    joined_60days <- joined %>% 
      group_by(id.x) %>%
      mutate(date_diff = as.numeric(date.x - date.y)) %>%
      filter(!is.na(date_diff)) %>%  ## remove survey points with no rallies in 50mi/60d
      filter(abs(date_diff) <= 60) %>%
      group_by(id.x) %>%
      count()
    
    head(joined_60days)
    
    Simple feature collection with 4 features and 2 fields
    geometry type:  POINT
    dimension:      XY
    bbox:           xmin: 1268816 ymin: -556609 xmax: 1438586 ymax: -322572.4
    projected CRS:  NAD27 / US National Atlas Equal Area
    # A tibble: 4 x 3
       id.x     n            geometry
      <dbl> <int>         <POINT [m]>
    1     1     1   (1350401 -556609)
    2     2     2 (1438586 -455743.1)
    3     3     1 (1352585 -479940.5)
    4     4     2 (1268816 -322572.4)
    

    Quick visual check:

    enter image description here