Search code examples
rgeojsonspatial

Filter point coordinates that fit within a geojson file


I have a dataframe with longitude, latitude and temperature columns, and a separate geojson file downloaded from here representing an area that fits within all the coordinates in my df.

How do I filter the temperature df so that I only have records of areas that fit within the geojson file?

I have no idea where to start on this so haven't attempted anything yet!

For example, in the temperature df:

dput(temperature_df)
structure(list(Lat = c(52, 53.23, 51.37), Long = c(-1.89, -2.45, 
-2.1), Temp = c(13.67, 10.38, -2.45)), class = "data.frame", row.names = c(NA, 
-3L))

I am expecting to return the filtered df:

Lat Long Temp
52.00 -1.89 13.67
53.23 -2.45 10.38

Solution

  • Preamble
    To ensure the validity of your results, it is essential that you know which coordinate system (CRS) forms the basis of the coordinates for both of your datasets. You state that setting a CRS of WGS84/EPSG:4326 correctly plotted your points, but unless you know for a fact that your point coordinates are WGS84, it can cause erroneous results. See this answer for a more in-depth discussion on why you cannot just assume.

    This solution is based on the assumption that the coordinates in your temperature df are WGS84/EPSG:4326. The CRS of the geojson file is OSGB36/British National Grid. Given the data are from the UK ONS, we can be 100% certain the CRS is correct so it will be used as the base CRS in this example.

    You will not achieve the desired result based on the sample data you provided as all of your points are outside the West Midlands region, so I have selected the South West region in order to generate a representative result.

    Load required packages and data

    library(sf)
    library(dplyr)
    library(ggplot2)
    
    # Read and subset geojson file from your working directory, downloaded from
    # https://geoportal.statistics.gov.uk/datasets/7c23fbe8e89d4cf79ff7f2a6058e6200_0/explore
    sf_poly <- read_sf("Regions_December_2023_Boundaries_EN_BFC_4211903881402860639.geojson") %>%
      filter(RGN23NM == "South West")
    
    sf_poly["RGN23NM"]
    # Simple feature collection with 1 feature and 1 field
    # Geometry type: MULTIPOLYGON
    # Dimension:     XY
    # Bounding box:  xmin: 82668.52 ymin: 5336.966 xmax: 435920 ymax: 246052.2
    # Projected CRS: OSGB36 / British National Grid
    # # A tibble: 1 × 2
    #   RGN23NM                                                                  geometry
    #   <chr>                                                          <MULTIPOLYGON [m]>
    # 1 South West (((83962.84 5401.15, 83970.68 5400.71, 83977.49 5404.05, 83985.27 540…
    

    Now create an sf of your point data:

    # Your point data (assumed to be WGS84/EPSG:4326)
    points <- read.table(text = "Lat    Long    Temp
    52.00   -1.89   13.67
    53.23   -2.45   10.38
    51.37   -2.10   -2.45", header = TRUE) %>%
      mutate(ID = 1:n()) # Added a unique ID here for illustrative purposes
    
    # Create sf from your df, assign assumed WGS84 CRS, then project to CRS of sf_poly
    sf_points <- st_as_sf(points, coords = c("Long", "Lat"), crs = 4326) %>%
      st_transform(st_crs(sf_poly))
    
    sf_points
    # Simple feature collection with 3 features and 2 fields
    # Geometry type: POINT
    # Dimension:     XY
    # Bounding box:  xmin: 370057.6 ymin: 163442.6 xmax: 407648.6 ymax: 370423.5
    # Projected CRS: OSGB36 / British National Grid
    #    Temp ID                  geometry
    # 1 13.67  1 POINT (407648.6 233512.2)
    # 2 10.38  2 POINT (370057.6 370423.5)
    # 3 -2.45  3 POINT (393135.4 163442.6)
    

    Note that sf_poly and sf_points have the same CRS, but as stated, you can only be certain that this is valid if your original points coordinates are WGS84. If you know they are in a different CRS, modify crs = 4326 to match the actual CRS. Also, be aware of the order that longitude and latitude are declared in e.g. coords = c("Long", "Lat"). The longitude/x-axis values need to be declared first. If you applied the code from the other answer, your points would be plotted off the coast of Somalia.

    Subset your points_sf using poly_sf

    I have included two options:

    1. Using st_intersection(), only points within poly_sf are returned. You can safely ignore the warning:
    # Return sf_points only if they fall within sf_poly
    sf_inter <- st_intersection(sf_points, sf_poly)
    
    # Warning message:
    #   attribute variables are assumed to be spatially constant throughout all geometries
    
    sf_inter[,c("ID", "RGN23NM")]
    # Simple feature collection with 2 features and 2 fields
    # Geometry type: POINT
    # Dimension:     XY
    # Bounding box:  xmin: 393135.4 ymin: 163442.6 xmax: 407648.6 ymax: 233512.2
    # Projected CRS: OSGB36 / British National Grid
    #   ID    RGN23NM                  geometry
    # 1  1 South West POINT (407648.6 233512.2)
    # 3  3 South West POINT (393135.4 163442.6)
    
    1. Using st_join(), all points are returned, but each point is identified as inside or outside sf_poly:
    # Assign values to all sf_points 
    sf_join <- st_join(sf_points, sf_poly) %>%
      mutate(RGN23NM = if_else(is.na(RGN23NM), "Outside Area", RGN23NM))
    
    sf_join[,c("ID", "RGN23NM")]
    # Simple feature collection with 3 features and 2 fields
    # Geometry type: POINT
    # Dimension:     XY
    # Bounding box:  xmin: 370057.6 ymin: 163442.6 xmax: 407648.6 ymax: 370423.5
    # Projected CRS: OSGB36 / British National Grid
    #   ID      RGN23NM                  geometry
    # 1  1   South West POINT (407648.6 233512.2)
    # 2  2 Outside Area POINT (370057.6 370423.5)
    # 3  3   South West POINT (393135.4 163442.6)
    

    And the result, using sf_join for illustrative purposes:

    ggplot() +
      geom_sf(data = sf_poly) +
      geom_sf(data = sf_join, aes(colour = RGN23NM), size = 2) +
      scale_colour_discrete(name = "Temperature\nPoints")
    

    result