Search code examples
rareargdalchoropleth

Use polygon area to calculate the population density in R


I'm very new to R and I'm trying to make a density choropleth map. I want to calculate the population density of a administrative boundary in R using rgal:

path <- "recintos_municipales_inspire_peninbal_etrs89.shp"
lineas_limite <- readOGR(path, use_iconv=TRUE, encoding = "UTF-8")

I see how rgal creates a data structure where I can access easily to the shapefile attributes table by lineas_limite@data and to the polygons by lineas_limite@polygons.

I have a field in lineas_limite@data named population which I want to use against something as the native variables available in QGIS like $area.

Then I'd like to create a new density variable to represent it in a choropleth map:

coast@data <- coast@data %>%
  mutate(
      dens = population / area
  )

Is this the best way to make it?

The data comes from the Spanish National Geographic Institute (link to the source).

The dataset name is Municipal boundary lines. The platform will download several files and the one I'm working with is recintos_provinciales_inspire_peninbal_etrs89.

On the other hand, I've created a GitHub repo with the shapefiles but those already have an area field which is the one I'm interesting in and which I created with QGIS (so don't pay attention because I'm trying to generate it from R).

Thank you in advance!


Solution

  • Overview

    To calculate each polygon's area, you'll need to do two things:

    1. Store each polygon's boundaries (seen here as a list of 77 matrices)
    2. Calculate and store each polygon's area using the geosphere::areaPolygon() function.

    From there, you can proceed to calculate each polygon's population density.


    Reproducible Example

    Here is an example that calculates the area for each of the City of Chicago's 77 community areas (i.e. neighborhoods).

    # install necessary packages
    install.packages( pkgs = c( "geosphere", "sp", "rgdal" ) )
    
    # load necessary packages
    library( geosphere )
    library( sp )
    library( rgdal )
    
    # store Chicago current community area
    # GeoJSON URL as a character vector
    geojson_comarea_url <- "https://data.cityofchicago.org/api/geospatial/cauq-8yn6?method=export&format=GeoJSON"
    
    # transform URL character vector into spatial dataframe
    comarea606 <- readOGR( dsn = geojson_comarea_url
                           , layer = "OGRGeoJSON"
                           , stringsAsFactors = FALSE
                           , verbose = FALSE # to hide progress message after object is created
    )
    
    ## WRONG WAY is trusting the area slot within each polygon
    wrong.list.of.polygon.areas <-
      data.frame(
        comarea606_Name           = comarea606$community
        , comarea606_Inherit_Area = sapply( X = slot( object = comarea606, name = "polygons" )
                                    , FUN = function(i) 
                                      i@area )
        , stringsAsFactors = FALSE
      )
    
    # view the wrong results
    wrong.list.of.polygon.areas
    #           comarea606_Name comarea606_Inherit_Area
    # 1                 DOUGLAS            0.0004632396
    # 2                 OAKLAND            0.0001702833
    # 3             FULLER PARK            0.0002004699
    # 4         GRAND BOULEVARD            0.0004881245
    # 5                 KENWOOD            0.0002926158
    # 6          LINCOLN SQUARE            0.0007200422
    # 7         WASHINGTON PARK            0.0004263996
    # 8               HYDE PARK            0.0004538953
    # 9                WOODLAWN            0.0005816583
    # 10            ROGERS PARK            0.0005175561
    # 11         JEFFERSON PARK            0.0006546563
    # 12            FOREST GLEN            0.0008997264
    # 13             NORTH PARK            0.0007094071
    # 14            ALBANY PARK            0.0005402586
    # 15           PORTAGE PARK            0.0011116825
    # 16            IRVING PARK            0.0009040059
    # 17                DUNNING            0.0010449760
    # 18              MONTCLARE            0.0002780923
    # 19         BELMONT CRAGIN            0.0011001642
    # 20             WEST RIDGE            0.0009936912
    # 21                HERMOSA            0.0003287441
    # 22               AVONDALE            0.0005576454
    # 23           LOGAN SQUARE            0.0010089020
    # 24          HUMBOLDT PARK            0.0010128203
    # 25              WEST TOWN            0.0012858100
    # 26                 AUSTIN            0.0020082609
    # 27     WEST GARFIELD PARK            0.0003636864
    # 28     EAST GARFIELD PARK            0.0005429495
    # 29         NEAR WEST SIDE            0.0015968996
    # 30         NORTH LAWNDALE            0.0009014540
    # 31                 UPTOWN            0.0006568040
    # 32         SOUTH LAWNDALE            0.0012889745
    # 33        LOWER WEST SIDE            0.0008213691
    # 34        NEAR SOUTH SIDE            0.0005013217
    # 35          ARMOUR SQUARE            0.0002796204
    # 36           NORWOOD PARK            0.0012862114
    # 37        NEAR NORTH SIDE            0.0007728518
    # 38                   LOOP            0.0004668873
    # 39            SOUTH SHORE            0.0008228658
    # 40                CHATHAM            0.0008277123
    # 41            AVALON PARK            0.0003504538
    # 42          SOUTH CHICAGO            0.0009378263
    # 43               BURNSIDE            0.0001708577
    # 44        CALUMET HEIGHTS            0.0004908500
    # 45               ROSELAND            0.0013497948
    # 46           NORTH CENTER            0.0005755101
    # 47                PULLMAN            0.0005929323
    # 48          SOUTH DEERING            0.0030522425
    # 49              EAST SIDE            0.0008365337
    # 50           WEST PULLMAN            0.0009980793
    # 51              RIVERDALE            0.0009880640
    # 52              HEGEWISCH            0.0014658303
    # 53         GARFIELD RIDGE            0.0011864234
    # 54         ARCHER HEIGHTS            0.0005629105
    # 55          BRIGHTON PARK            0.0007640013
    # 56          MCKINLEY PARK            0.0003970284
    # 57              LAKE VIEW            0.0008796889
    # 58             BRIDGEPORT            0.0005869749
    # 59               NEW CITY            0.0013551840
    # 60            WEST ELSDON            0.0003291759
    # 61              GAGE PARK            0.0006167374
    # 62               CLEARING            0.0007157968
    # 63              WEST LAWN            0.0008280548
    # 64           CHICAGO LAWN            0.0009886720
    # 65         WEST ENGLEWOOD            0.0008847861
    # 66              ENGLEWOOD            0.0008617058
    # 67 GREATER GRAND CROSSING            0.0009942937
    # 68           LINCOLN PARK            0.0008905024
    # 69                ASHBURN            0.0013621619
    # 70         AUBURN GRESHAM            0.0010564790
    # 71                BEVERLY            0.0008922944
    # 72     WASHINGTON HEIGHTS            0.0008004438
    # 73        MOUNT GREENWOOD            0.0007594687
    # 74            MORGAN PARK            0.0009230990
    # 75                  OHARE            0.0037524990
    # 76              EDGEWATER            0.0004890111
    # 77            EDISON PARK            0.0003194218
    
    
    ## RIGHT WAY is calculating the area yourself
    
    # store each polygon's boundaries
    list.of.polygon.boundaries <- sapply( X = slot( object = comarea606, name = "polygons" )
                                     , FUN = function(i) 
                                       sp::coordinates( obj = slot( object = i, name = "Polygons")[[1]] )
    )
    
    # calculate each polygon's area in square meters
    comarea606@data$Square_Meter_Area <- sapply( X = list.of.polygon.boundaries, FUN = geosphere::areaPolygon )
    
    # view the right results
    comarea606@data[ c( "community", "Square_Meter_Area" ) ]
    #                community            Square_Meter_Area
    # 1                 DOUGLAS                      4273829
    # 2                 OAKLAND                      1571301
    # 3             FULLER PARK                      1850268
    # 4         GRAND BOULEVARD                      4504952
    # 5                 KENWOOD                      2700750
    # 6          LINCOLN SQUARE                      6628739
    # 7         WASHINGTON PARK                      3936533
    # 8               HYDE PARK                      4190262
    # 9                WOODLAWN                      5370998
    # 10            ROGERS PARK                      4762104
    # 11         JEFFERSON PARK                      6026453
    # 12            FOREST GLEN                      8280516
    # 13             NORTH PARK                      6529977
    # 14            ALBANY PARK                      4974190
    # 15           PORTAGE PARK                     10237543
    # 16            IRVING PARK                      8325095
    # 17                DUNNING                      9624357
    # 18              MONTCLARE                      2561945
    # 19         BELMONT CRAGIN                     10135661
    # 20             WEST RIDGE                      9144227
    # 21                HERMOSA                      3028810
    # 22               AVONDALE                      5136605
    # 23           LOGAN SQUARE                      9295521
    # 24          HUMBOLDT PARK                      9334890
    # 25              WEST TOWN                     11850755
    # 26                 AUSTIN                     18511300
    # 27     WEST GARFIELD PARK                      3353110
    # 28     EAST GARFIELD PARK                      5005851
    # 29         NEAR WEST SIDE                     14724107
    # 30         NORTH LAWNDALE                      8313567
    # 31                 UPTOWN                      6047440
    # 32         SOUTH LAWNDALE                     11891299
    # 33        LOWER WEST SIDE                      7576149
    # 34        NEAR SOUTH SIDE                      4623602
    # 35          ARMOUR SQUARE                      2579489
    # 36           NORWOOD PARK                     11839116
    # 37        NEAR NORTH SIDE                      7123217
    # 38                   LOOP                      4304581
    # 39            SOUTH SHORE                      7600312
    # 40                CHATHAM                      7647583
    # 41            AVALON PARK                      3237793
    # 42          SOUTH CHICAGO                      8664835
    # 43               BURNSIDE                      1578918
    # 44        CALUMET HEIGHTS                      4535904
    # 45               ROSELAND                     12477753
    # 46           NORTH CENTER                      5300414
    # 47                PULLMAN                      5481215
    # 48          SOUTH DEERING                     28222390
    # 49              EAST SIDE                      7732987
    # 50           WEST PULLMAN                      9231061
    # 51              RIVERDALE                      9140344
    # 52              HEGEWISCH                     13559960
    # 53         GARFIELD RIDGE                     10952400
    # 54         ARCHER HEIGHTS                      5195326
    # 55          BRIGHTON PARK                      7050570
    # 56          MCKINLEY PARK                      3663260
    # 57              LAKE VIEW                      8102329
    # 58             BRIDGEPORT                      5415321
    # 59               NEW CITY                     12507893
    # 60            WEST ELSDON                      3038932
    # 61              GAGE PARK                      5693468
    # 62               CLEARING                      6609603
    # 63              WEST LAWN                      7647276
    # 64           CHICAGO LAWN                      9130323
    # 65         WEST ENGLEWOOD                      8170431
    # 66              ENGLEWOOD                      7957145
    # 67 GREATER GRAND CROSSING                      9183452
    # 68           LINCOLN PARK                      8204655
    # 69                ASHBURN                     12584519
    # 70         AUBURN GRESHAM                      9760656
    # 71                BEVERLY                      8247709
    # 72     WASHINGTON HEIGHTS                      7398212
    # 73        MOUNT GREENWOOD                      7021929
    # 74            MORGAN PARK                      8535503
    # 75                  OHARE                     34379638
    # 76              EDGEWATER                      4501053
    # 77            EDISON PARK                      2939134
    
    # end of script # 
    

    Mistrusting the Inherit 'area' Slot Within Each Polygon

    According to the documentation for ?sp::Polygons-class, the area slot within each polygon is not the actual area.

    Object of class "numeric"; the gross total planar area of the Polygon list but not double-counting holes (changed from 0.9-58 - islands are summed, holes are ignored rather than subtracted); these values are used to make sure that polygons of a smaller area are plotted after polygons of a larger area, does not respect projection as objects of this class have no projection defined.

    It seems more like a placeholder than the actual value you are seeking, which is why I used the geosphere::areaPolygon() function to obtain that value.


    Update

    The OP's data was made available via a GitHub repository. Down below is the same logic from above, only using different data.

    # install necessary packages
    install.packages( pkgs = c( "geosphere", "sp", "rgdal" ) )
    
    # load necessary packages
    library( geosphere )
    library( sp )
    library( rgdal )
    
    # load necessary data
    
    # download a .zip file of the GitHub repository
    download.file( url = "https://github.com/LuisSevillano/lineas-limite/archive/master.zip"
                                         , destfile = "lineas-limite-master.zip"
    ) #39.2MB ~ 1 minute DL time on my crappy wifi
    
    # unzip the entire GitHub repository
    unzip( zipfile = "lineas-limite-master.zip" )
    
    # set working directory to be
    # inside the sub-folder where
    # the shapefiles of interest are located
    setwd( dir = "/Users/cristiannuno/Downloads/lineas-limite-master/recintos_municipales_inspire_peninbal_etrs89/" )
    
    # create spatial polygon data frame
    lineas_limite <- rgdal::readOGR( dsn = getwd()
                                     , layer = "recintos_municipales_inspire_peninbal_etrs89"
                                     , stringsAsFactors = FALSE
                                     )
    
    
    # store each polygon's boundaries
    list.of.polygon.boundaries <- sapply( X = slot( object = lineas_limite, name = "polygons" )
                                          , FUN = function(i) 
                                            sp::coordinates( obj = slot( object = i, name = "Polygons")[[1]] )
    )
    
    # calculate each polygon's area in square meters
    lineas_limite@data$Square_Meter_Area <- sapply( X = list.of.polygon.boundaries
                                                 , FUN = geosphere::areaPolygon 
    )
    
    # View the first six results
    head(
      lineas_limite@data[ c( "NAMEUNIT", "Square_Meter_Area" ) ]
    )
    
    #    NAMEUNIT Square_Meter_Area
    # 0      Abla          45244459
    # 1  Abrucena          83677254
    # 2      Adra          89694443
    # 3 Albánchez          35133840
    # 4 Alboloduy          69734542
    # 5  Pradejón          31755520
    
    # end of script #
    

    Resources

    I encourage you to read through the ?sp::Polygons-class documentation, as well as the SpatialPointsDataFrame properties and operators in R to learn more about spatial polygons and how to use them in R.


    Session Info

    R version 3.4.3 (2017-11-30)
    Platform: x86_64-apple-darwin15.6.0 (64-bit)
    Running under: macOS Sierra 10.12.6
    
    Matrix products: default
    BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
    LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
    
    locale:
    [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
    
    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base     
    
    other attached packages:
    [1] rgdal_1.2-16    geosphere_1.5-7 sp_1.2-7       
    
    loaded via a namespace (and not attached):
     [1] Rcpp_0.12.15       lattice_0.20-35    digest_0.6.14      mime_0.5          
     [5] grid_3.4.3         R6_2.2.2           xtable_1.8-2       magrittr_1.5      
     [9] leaflet_1.1.0.9000 tools_3.4.3        htmlwidgets_1.0    crosstalk_1.0.0   
    [13] shiny_1.0.5        httpuv_1.3.5       yaml_2.1.16        compiler_3.4.3    
    [17] htmltools_0.3.6 
    

    RStudio Version

    $citation
    
    To cite RStudio in publications use:
    
      RStudio Team (2016). RStudio: Integrated Development for R. RStudio,
      Inc., Boston, MA URL http://www.rstudio.com/.
    
    A BibTeX entry for LaTeX users is
    
      @Manual{,
        title = {RStudio: Integrated Development Environment for R},
        author = {{RStudio Team}},
        organization = {RStudio, Inc.},
        address = {Boston, MA},
        year = {2016},
        url = {http://www.rstudio.com/},
      }
    
    
    $mode
    [1] "desktop"
    
    $version
    [1] ‘1.1.419’