Search code examples
rrastershapefileqgis

r - Join data frame coordinates by shapefile regions aka Join Attributes by Location


I have a large data set, loaded in R as a data.frame. It contains observations associated with coordinate points (lat/lon).

I also have a shape file of North America.

In the empty column (NA filled) in my data frame, labelled BCR, I want to insert the region name which each coordinate falls into according to the shapefile.

I know how to do this is QGIS using the Vector > Data Management Tools > Join Attributes by Location

The shapefile can be downloaded by clicking HERE.

My data, right now, looks like this (a sample):

LATITUDE    LONGITUDE   Year    EFF n   St  PJ  day BCR
50.406752   -104.613    2009    1   0   SK  90  2   NA
50.40678    -104.61256  2009    2   0   SK  120 3   NA
50.40678    -104.61256  2009    2   1   SK  136 2   NA
50.40678    -104.61256  2009    3   2   SK  149 4   NA
43.0026385  -79.2900467 2009    2   0   ON  112 3   NA
43.0026385  -79.2900467 2009    2   1   ON  122 3   NA

But I want it to look like this:

LATITUDE    LONGITUDE   Year    EFF n   St  PJ  day BCR
50.406752   -104.613    2009    1   0   SK  90  2   Prairie Potholes
50.40678    -104.61256  2009    2   0   SK  120 3   Prairie Potholes
50.40678    -104.61256  2009    2   1   SK  136 2   Prairie Potholes
50.40678    -104.61256  2009    3   2   SK  149 4   Prairie Potholes
43.0026385  -79.2900467 2009    2   0   ON  112 3   Lower Great Lakes/St.Lawrence Plain
43.0026385  -79.2900467 2009    2   1   ON  122 3   Lower Great Lakes/St.Lawrence Plain

Notice the BCR column is now filled with the appropriate BCR region name.

My code so far is just importing and formatting the data and shapefile:

library(rgdal)
library(proj4)
library(sp)
library(raster)

# PFW data, full 2.5m observations
df = read.csv("MyData.csv")

# Clearning out empty coordinate data
pfw = df[(df$LATITUDE != 0) & (df$LONGITUDE != 0) & (!is.na(df$LATITUDE)) & (!is.na(df$LATITUDE)),]

# Creating a new column to be filled with associated Bird Conservation Regions
pfw["BCR"] = NA

# Making a duplicate data frame to conserve data
toSPDF = pfw

# Ensuring spatial formatting
#coordinates(toSPDF) = ~LATITUDE + LONGITUDE
SPDF <- SpatialPointsDataFrame(toSPDF[,c("LONGITUDE", "LATITUDE"),],
                                  toSPDF,
                                  proj4string = CRS("+init=epsg:4326"))

# BCR shape file, no state borders
shp = shapefile("C:/Users/User1/Desktop/BCR/BCR_Terrestrial_master_International.shx")
spPoly = spTransform(shp, CRS("+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"))

# Check
isTRUE(proj4string(spPoly) == proj4string(SPDF))

# Trying to join attributes by location
  #try1 = point.in.polygon(spPoly, SPDF) # Sounds good doesn't work
  #a.data <- over(SPDF, spPoly[,"BCRNAME"]) # Error: cannot allocate vector of size 204.7 Mb

Solution

  • I think you want to do a spatial query with points and polygons. That is to assign polygon attributes to the corresponding points. You can do that like this:

    Example data

    library(terra)
    f <- system.file("ex/lux.shp", package="terra")
    polygons <- vect(f)
    points <- spatSample(v, 10)
    

    Solution

    e <- extract(polygons, points)
    
    e
    #   id.y ID_1       NAME_1 ID_2           NAME_2 AREA    POP
    #1     1    3   Luxembourg    9 Esch-sur-Alzette  251 176820
    #2     2    3   Luxembourg    9 Esch-sur-Alzette  251 176820
    #3     3    2 Grevenmacher    6       Echternach  188  18899
    #4     4    1     Diekirch    2         Diekirch  218  32543
    #5     5    3   Luxembourg    9 Esch-sur-Alzette  251 176820
    #6     6    1     Diekirch    4          Vianden   76   5163
    #7     7    3   Luxembourg   11           Mersch  233  32112
    #8     8    2 Grevenmacher    7           Remich  129  22366
    #9     9    1     Diekirch    3          Redange  259  18664
    #10   10    3   Luxembourg    9 Esch-sur-Alzette  251 176820
    

    With the older spatial packages you can use raster::extract or sp::over.

    Example data:

    library(raster)
    pols <- shapefile(system.file("external/lux.shp", package="raster")) 
    set.seed(20180121) 
    pts <- data.frame(coordinates(spsample(pols, 5, 'random')), name=letters[1:5])
    plot(pols); points(pts)
    

    Solution:

    e <- extract(pols, pts[, c('x', 'y')]) 
    pts$BCR <- e$NAME_2 
    
    pts
    #         x        y name              BCR
    #1 6.009390 49.98333    a            Wiltz
    #2 5.766407 49.85188    b          Redange
    #3 6.268405 49.62585    c       Luxembourg
    #4 6.123015 49.56486    d       Luxembourg
    #5 5.911638 49.53957    e Esch-sur-Alzette