Search code examples
rchoroplethspdf

How to subset polygons using distance in a SPDF


Ok. So, I am learning and made a very similar question when I hadn't learnt what a Spatial Polygons Data Frame was about 10 days ago: Select raster in ggplot near coastline.

Now, I have discovered the magic of SPDF and choropleth maps and have in essence the same question but with different types of files. I am still wrapping my head around S4 objects and I can't figure out how to subset certain mini-polygons MUNICIPI from my data set.

To the point!

Context: I have a SPDF that contains these data: Dataframe inside cataconfidence

Aim:

I would like to subset all MUNICIPI that are within a certain distance from the coast. Let's say 20km as @urbandatascientist set in his answer to my first question and create a choropleth map of, for example, upper_trees with the subsetted MUNICIPI.

From the Select raster in ggplot near coastline post I also have the list.of.polygon.boundaries that we'll substract MUNICIPI coordinates from.

Data is here.

Once I subset the coastal MUNICIPI, I'm hoping the map will look something like the green shaded region here. I have also tried to make sure that coordinates are in the same between list.of.polygon.boundaries.

Any clues or ideas would be greatly appreciated!

So far, here's my chloropleth map for the entire region using upper_trees as an example:

tm_shape(catasense2)+ tm_fill(col="upper_trees",n=8,style="quantile")+ tm_layout(legend.outside =TRUE)

map for the whole region[2]


Solution

  • Overview

    Similar to the answer for Select raster in ggplot near coastline, the solution involves the following steps:

    • Calculating the coordinates for the boundaries of the Balearic (Iberian Sea) and Western Basin portions of the shape file for the western part of the Mediterranean Sea.

    • Calculating the centroids of each polygon in MUNICIPI from the OP's link to her Google Drive folder, which contains a .zip file of the shape file.

    • Calculate the distance between the first two points and subset MUNICIPI to show the polygons whose distance from their centroid to the Western Basin is less than or equal to 20 kilometers.

    Filtering the Coordinates of the Western Basin

    Rather than calculating the distance of each centroid to each coordinate pair within western.basin.polygon.coordinates, I only included coordinates within western.basin.polygon.coordinates whose latitudinal point was in between (inclusive) of the eastern coast of Catalonia.

    For reference, I use the latitudinal points of Peniscola, Catalonia and Cerbere, France. By only keeping the coordinate pairs that lay along the eastern coast of Catalonia, the distance calculation between western.basin.polygon.coordinates and each centroid in MUNICIPI completes in about ~4 minutes.

    SS of Google Maps

    I then stored the indices of the polygons within MUNICIPI whose centroid distances were less than or equal to 20 kilometers in less.than.or.equal.to.max.km - a logical vector of TRUE/FALSE values. Using a leaflet map, I show how I subset MUNICIPI to only visualize those polygons that contain TRUE values within less.than.or.equal.to.max.km.

    # load necessary packages
    library( geosphere )
    library( leaflet )
    library( sf )
    
    # load necessary data
    unzip( zipfile = "catasense2-20180327T023956Z-001.zip" )
    
    # transfrom zip file contents
    # into a sf data frame
    MUNICIPI <-
      read_sf(
        dsn = paste0( getwd(), "/catasense2" )
        , layer = "catasense2"
      )
    
    
    # map data values to colors
    my.color.factor <- 
      colorFactor( palette = "viridis", domain = MUNICIPI$uppr_tr )
    
    # Visualize
    leaflet( data = MUNICIPI ) %>%
      setView( lng = 1.514619
               , lat = 41.875227
               , zoom = 8 ) %>%
      addTiles() %>%
      addPolygons( color = ~my.color.factor( x = uppr_tr ) )
    

    SS of All of MUNICIPI

    # unzip the western basin zip file
    # unzip( zipfile = "iho.zip" )
    
    # create sf data frame
    # of the western basin
    western.basin <-
      read_sf( dsn = getwd()
               , layer = "iho"
               , stringsAsFactors = FALSE )
    
    # filter the western.basin
    # to only include those bodies of water
    # nearest catalonia
    water.near.catalonia.condition <-
      which( 
        western.basin$name %in% 
          c( "Balearic (Iberian Sea)"
             , "Mediterranean Sea - Western Basin" )
        )
    
    western.basin <-
      western.basin[ water.near.catalonia.condition, ]
    
    # identify the coordinates for each of the
    # remaining polygons in the western.basin
    # get the boundaries of each
    # polygon within the western basin
    # and retain only the lon (X) and lat (Y) values
    western.basin.polygon.coordinates <-
      lapply( 
        X = western.basin$geometry
        , FUN = function( i )
          st_coordinates( i )[ , c( "X", "Y" ) ]
      )
    
    # find the centroid
    # of each polygon in MUNICIPI
    MUNICIPI$centroids <-
      st_centroid( x = MUNICIPI$geometry )
    
    # Warning message:
    #   In st_centroid.sfc(x = MUNICIPI$geometry) :
    #   st_centroid does not give correct centroids for longitude/latitude data
    
    # store the latitude
    # of the western (Peniscola, Catalonia) and eastern (Cerbere, France) 
    # most parts of the eastern coast of Catalonia
    east.west.lat.range <- 
      setNames( object = c( 40.3772185, 42.4469981 )
                , nm = c( "east", "west" )
      )
    
    # set the maximum distance
    # allowed between a point in df
    # and the sea to 20 kilometers
    max.km <- 20
    
    # store the indices in the
    # western basin polygon coordinates whose
    # lat points
    # fall in between (inclusive) 
    # of the east.west.lat.range
    satisfy.east.west.lat.max.condition <-
      lapply(
        X = western.basin.polygon.coordinates
        , FUN = function(i)
          which(
            i[, "Y"] >= east.west.lat.range["east"] &
              i[, "Y"] <= east.west.lat.range["west"]
          )
      )
    
    # filter the western basin polygon coordinate
    # by the condition
    western.basin.polygon.coordinates <-
      mapply(
        FUN = function( i, j )
          i[ j, ]
        , western.basin.polygon.coordinates
        , satisfy.east.west.lat.max.condition
      )
    
    # calculate the distance of each centroid
    # in MUNICIPI
    # to each point in both western.basin
    # polygon boundary coordinates
    # Takes ~4 minutes
    distance <-
      apply(
        X = do.call( what = "rbind", args = MUNICIPI$centroids )
        , MARGIN = 1
        , FUN = function( i )
          lapply(
            X = western.basin.polygon.coordinates
            , FUN = function( j )
              distGeo(
                p1 = i
                , p2 = j
              ) / 1000 # to transform results into kilometers
          )
      )
    
    # 1.08 GB list is returned
    object.size( x = distance)
    # 1088805736 bytes
    
    # find the minimum distance value
    # for each list in distance
    distance.min <-
      lapply(
        X = distance
        , FUN = function( i )
          lapply(
            X = i
            , FUN = function( j )
              min( j )
          )
      )
    
    # identify which points in df
    # are less than or equal to max.km
    less.than.or.equal.to.max.km <-
      sapply(
        X = distance.min
        , FUN = function( i )
          sapply(
            X = i
            , FUN = function( j )
              j <= max.km
          )
      )
    
    # convert matrix results into
    # vector of TRUE/FALSE indices
    less.than.or.equal.to.max.km <-
      apply(
        X = less.than.or.equal.to.max.km
        , MARGIN = 2
        , FUN = any
      )
    
    # now only color data that meets
    # the less.than.or.equal.to.max.km condition
    leaflet( data = MUNICIPI[ less.than.or.equal.to.max.km, ] ) %>%
      setView( lng = 1.514619
               , lat = 41.875227
               , zoom = 8 ) %>%
      addTiles() %>%
      addPolygons( color = ~my.color.factor( x = uppr_tr ) ) 
    
    # end of script #
    

    SS of Filtered MUNICIPI

    Session Info

    R version 3.4.4 (2018-03-15)
    Platform: x86_64-apple-darwin15.6.0 (64-bit)
    Running under: macOS High Sierra 10.13.2
    
    Matrix products: default
    BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
    LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
    
    locale:
    [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
    
    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods  
    [7] base     
    
    other attached packages:
    [1] sf_0.6-0           leaflet_1.1.0.9000 geosphere_1.5-7   
    
    loaded via a namespace (and not attached):
     [1] mclust_5.4         Rcpp_0.12.16       mvtnorm_1.0-7     
     [4] lattice_0.20-35    class_7.3-14       rprojroot_1.3-2   
     [7] digest_0.6.15      mime_0.5           R6_2.2.2          
    [10] plyr_1.8.4         backports_1.1.2    stats4_3.4.4      
    [13] evaluate_0.10.1    e1071_1.6-8        ggplot2_2.2.1     
    [16] pillar_1.2.1       rlang_0.2.0        lazyeval_0.2.1    
    [19] diptest_0.75-7     whisker_0.3-2      kernlab_0.9-25    
    [22] R.utils_2.6.0      R.oo_1.21.0        rmarkdown_1.9     
    [25] udunits2_0.13      stringr_1.3.0      htmlwidgets_1.0   
    [28] munsell_0.4.3      shiny_1.0.5        compiler_3.4.4    
    [31] httpuv_1.3.6.2     htmltools_0.3.6    nnet_7.3-12       
    [34] tibble_1.4.2       gridExtra_2.3      dendextend_1.7.0  
    [37] viridisLite_0.3.0  MASS_7.3-49        R.methodsS3_1.7.1 
    [40] grid_3.4.4         jsonlite_1.5       xtable_1.8-2      
    [43] gtable_0.2.0       DBI_0.8            magrittr_1.5      
    [46] units_0.5-1        scales_0.5.0       stringi_1.1.7     
    [49] reshape2_1.4.3     viridis_0.5.0      flexmix_2.3-14    
    [52] sp_1.2-7           robustbase_0.92-8  squash_1.0.8      
    [55] RColorBrewer_1.1-2 tools_3.4.4        fpc_2.1-11        
    [58] trimcluster_0.1-2  DEoptimR_1.0-8     crosstalk_1.0.0   
    [61] yaml_2.1.18        colorspace_1.3-2   cluster_2.0.6     
    [64] prabclus_2.2-6     classInt_0.1-24    knitr_1.20        
    [67] modeltools_0.2-21