Search code examples
rspatialshapefileterrargdal

Add region column in movement data - alternative to retired rgdal package?


I have a movement dataset and a map of Canada as a shapefile (.shp). An example of the movement dataset looks like this (the coordinates are projected in ESPG:5321):

longitude latitude DateTimeRounded
459663.3  7181890      2007-09-10
459734.2  7181938      2007-09-11
459680.5  7181933      2007-09-12
459640.1  7181893      2007-09-13
459605.2  7181897      2007-09-14
459928.7  7182175      2007-09-15
459855.1  7182104      2007-09-16

I'm not sure how to attach or show the shapefile, but if any one has suggestions, please let me know!

My goal is to create a new column in my dataset called "region", which indicates in which province my each data point is. If a datapoint is outside the borders of any province (i.e., in the sea), I want the associated entry in the region column to be NA.

The final dataset would look something like this:

longitude latitude DateTimeRounded  region
459663.3  7181890      2007-09-10   Manitoba
459734.2  7181938      2007-09-11   Manitoba
459680.5  7181933      2007-09-12   Manitoba
459640.1  7181893      2007-09-13   NA
459605.2  7181897      2007-09-14   NA
459928.7  7182175      2007-09-15   NA
459855.1  7182104      2007-09-16   Ontario

Originally, I used the package rgdal to do this, but since it has been retired I'm not sure how to go about it anymore. Would any one know how I can do this using other spatial packages like sp or terra?

This is what the original code looked like for reference:

setwd("CanadaMap/") # Directory for Canada Map
CANmap = readOGR(dsn=".", layer= "canada 5321")#File path to Canada Map
coordinates(df) = ~longitude + latitude #change longitude/latitude to what the lat/long columns are in your data frame
proj4string(df) <- CRS("+init=epsg:5321") #Use the correct reference system for your data
df <- spTransform(df, CRS("+proj=lcc +lat_1=44.5 +lat_2=54.5 +lat_0=0 +lon_0=-84 +x_0=1000000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs")) #denotes what CRS to use for df (same as mapCRS above)
df$region = over(df,CANmap)$NAME #adds column for region (which was province) 
df<-as.data.frame(df)

Solution

  • The sf package may work for your case.

    library(sf)
    library(dplyr)
    
    # Load example North Carolina data set from sf package
    example_shapefile <- st_read(system.file("shape/nc.shp", package="sf"))
    
    # Generate a bounding box
    example_bbox <- st_bbox(example)
    
    # This is just to create random fake coordinates within the bounding box
    example_coords <- data.frame(
      id = 1:40,
      long = runif(40,example_bbox[1],example_bbox[3]),
      lat = runif(40,example_bbox[2],example_bbox[4])
    ) |>
      st_as_sf(
        coords = c("long", "lat")
      )
    
    # Set CRS of our fake data to match our example shapefile
    # Check example shapefile CRS with st_crs(example_shapefile)
    st_crs(example_coords) <- 4267
    

    Once we have our data prepared, then it's only a few more lines of code:

    # Get our results
    result <- st_join(
      x = example_coords,
      y = example_shapefile,
      join = st_within,
      left = TRUE
    ) |>
      select(
        id, NAME, geometry
      )
    
    head(result)
    
    Simple feature collection with 6 features and 2 fields
    Geometry type: POINT
    Dimension:     XY
    Bounding box:  xmin: -83.27518 ymin: 33.9193 xmax: -75.91465 ymax: 36.58093
    Geodetic CRS:  NAD27
      id    NAME                   geometry
    1  1 Sampson POINT (-78.17492 35.05357)
    2  2  Person POINT (-79.03105 36.47649)
    3  3    <NA> POINT (-76.11417 34.39561)
    4  4    <NA> POINT (-83.27518 36.58093)
    5  5    <NA> POINT (-75.91465 34.83266)
    6  6    <NA>  POINT (-77.34693 33.9193)
    

    In this case, NAME corresponds to county in the North Carolina shapefile, but in your dataset would correspond to Region. Our spatial join specifies left = TRUE to signify a left join to keep all of our lat/lon data and fill in NAs if there's no matching region that the coordinates fall within from our shapefile. Note that the sf package uses data.frames with a geometry list column to store spatial data. Depending on what else you need to do afterwards, this may be fine or you may need to transform the geometry list column back into separate lat/lon columns.