I'm very new to R and I'm trying to make a density choropleth map. I want to calculate the population density of a administrative boundary in R using rgal
:
path <- "recintos_municipales_inspire_peninbal_etrs89.shp"
lineas_limite <- readOGR(path, use_iconv=TRUE, encoding = "UTF-8")
I see how rgal
creates a data structure where I can access easily to the shapefile
attributes table by lineas_limite@data
and to the polygons by lineas_limite@polygons
.
I have a field in lineas_limite@data
named population
which I want to use against something as the native variables available in QGIS like $area
.
Then I'd like to create a new density
variable to represent it in a choropleth map:
coast@data <- coast@data %>%
mutate(
dens = population / area
)
Is this the best way to make it?
The data comes from the Spanish National Geographic Institute (link to the source).
The dataset name is Municipal boundary lines. The platform will download several files and the one I'm working with is recintos_provinciales_inspire_peninbal_etrs89
.
On the other hand, I've created a GitHub repo with the shapefiles but those already have an area field which is the one I'm interesting in and which I created with QGIS (so don't pay attention because I'm trying to generate it from R).
Thank you in advance!
To calculate each polygon's area, you'll need to do two things:
geosphere::areaPolygon()
function.From there, you can proceed to calculate each polygon's population density.
Here is an example that calculates the area for each of the City of Chicago's 77 community areas (i.e. neighborhoods).
# install necessary packages
install.packages( pkgs = c( "geosphere", "sp", "rgdal" ) )
# load necessary packages
library( geosphere )
library( sp )
library( rgdal )
# store Chicago current community area
# GeoJSON URL as a character vector
geojson_comarea_url <- "https://data.cityofchicago.org/api/geospatial/cauq-8yn6?method=export&format=GeoJSON"
# transform URL character vector into spatial dataframe
comarea606 <- readOGR( dsn = geojson_comarea_url
, layer = "OGRGeoJSON"
, stringsAsFactors = FALSE
, verbose = FALSE # to hide progress message after object is created
)
## WRONG WAY is trusting the area slot within each polygon
wrong.list.of.polygon.areas <-
data.frame(
comarea606_Name = comarea606$community
, comarea606_Inherit_Area = sapply( X = slot( object = comarea606, name = "polygons" )
, FUN = function(i)
i@area )
, stringsAsFactors = FALSE
)
# view the wrong results
wrong.list.of.polygon.areas
# comarea606_Name comarea606_Inherit_Area
# 1 DOUGLAS 0.0004632396
# 2 OAKLAND 0.0001702833
# 3 FULLER PARK 0.0002004699
# 4 GRAND BOULEVARD 0.0004881245
# 5 KENWOOD 0.0002926158
# 6 LINCOLN SQUARE 0.0007200422
# 7 WASHINGTON PARK 0.0004263996
# 8 HYDE PARK 0.0004538953
# 9 WOODLAWN 0.0005816583
# 10 ROGERS PARK 0.0005175561
# 11 JEFFERSON PARK 0.0006546563
# 12 FOREST GLEN 0.0008997264
# 13 NORTH PARK 0.0007094071
# 14 ALBANY PARK 0.0005402586
# 15 PORTAGE PARK 0.0011116825
# 16 IRVING PARK 0.0009040059
# 17 DUNNING 0.0010449760
# 18 MONTCLARE 0.0002780923
# 19 BELMONT CRAGIN 0.0011001642
# 20 WEST RIDGE 0.0009936912
# 21 HERMOSA 0.0003287441
# 22 AVONDALE 0.0005576454
# 23 LOGAN SQUARE 0.0010089020
# 24 HUMBOLDT PARK 0.0010128203
# 25 WEST TOWN 0.0012858100
# 26 AUSTIN 0.0020082609
# 27 WEST GARFIELD PARK 0.0003636864
# 28 EAST GARFIELD PARK 0.0005429495
# 29 NEAR WEST SIDE 0.0015968996
# 30 NORTH LAWNDALE 0.0009014540
# 31 UPTOWN 0.0006568040
# 32 SOUTH LAWNDALE 0.0012889745
# 33 LOWER WEST SIDE 0.0008213691
# 34 NEAR SOUTH SIDE 0.0005013217
# 35 ARMOUR SQUARE 0.0002796204
# 36 NORWOOD PARK 0.0012862114
# 37 NEAR NORTH SIDE 0.0007728518
# 38 LOOP 0.0004668873
# 39 SOUTH SHORE 0.0008228658
# 40 CHATHAM 0.0008277123
# 41 AVALON PARK 0.0003504538
# 42 SOUTH CHICAGO 0.0009378263
# 43 BURNSIDE 0.0001708577
# 44 CALUMET HEIGHTS 0.0004908500
# 45 ROSELAND 0.0013497948
# 46 NORTH CENTER 0.0005755101
# 47 PULLMAN 0.0005929323
# 48 SOUTH DEERING 0.0030522425
# 49 EAST SIDE 0.0008365337
# 50 WEST PULLMAN 0.0009980793
# 51 RIVERDALE 0.0009880640
# 52 HEGEWISCH 0.0014658303
# 53 GARFIELD RIDGE 0.0011864234
# 54 ARCHER HEIGHTS 0.0005629105
# 55 BRIGHTON PARK 0.0007640013
# 56 MCKINLEY PARK 0.0003970284
# 57 LAKE VIEW 0.0008796889
# 58 BRIDGEPORT 0.0005869749
# 59 NEW CITY 0.0013551840
# 60 WEST ELSDON 0.0003291759
# 61 GAGE PARK 0.0006167374
# 62 CLEARING 0.0007157968
# 63 WEST LAWN 0.0008280548
# 64 CHICAGO LAWN 0.0009886720
# 65 WEST ENGLEWOOD 0.0008847861
# 66 ENGLEWOOD 0.0008617058
# 67 GREATER GRAND CROSSING 0.0009942937
# 68 LINCOLN PARK 0.0008905024
# 69 ASHBURN 0.0013621619
# 70 AUBURN GRESHAM 0.0010564790
# 71 BEVERLY 0.0008922944
# 72 WASHINGTON HEIGHTS 0.0008004438
# 73 MOUNT GREENWOOD 0.0007594687
# 74 MORGAN PARK 0.0009230990
# 75 OHARE 0.0037524990
# 76 EDGEWATER 0.0004890111
# 77 EDISON PARK 0.0003194218
## RIGHT WAY is calculating the area yourself
# store each polygon's boundaries
list.of.polygon.boundaries <- sapply( X = slot( object = comarea606, name = "polygons" )
, FUN = function(i)
sp::coordinates( obj = slot( object = i, name = "Polygons")[[1]] )
)
# calculate each polygon's area in square meters
comarea606@data$Square_Meter_Area <- sapply( X = list.of.polygon.boundaries, FUN = geosphere::areaPolygon )
# view the right results
comarea606@data[ c( "community", "Square_Meter_Area" ) ]
# community Square_Meter_Area
# 1 DOUGLAS 4273829
# 2 OAKLAND 1571301
# 3 FULLER PARK 1850268
# 4 GRAND BOULEVARD 4504952
# 5 KENWOOD 2700750
# 6 LINCOLN SQUARE 6628739
# 7 WASHINGTON PARK 3936533
# 8 HYDE PARK 4190262
# 9 WOODLAWN 5370998
# 10 ROGERS PARK 4762104
# 11 JEFFERSON PARK 6026453
# 12 FOREST GLEN 8280516
# 13 NORTH PARK 6529977
# 14 ALBANY PARK 4974190
# 15 PORTAGE PARK 10237543
# 16 IRVING PARK 8325095
# 17 DUNNING 9624357
# 18 MONTCLARE 2561945
# 19 BELMONT CRAGIN 10135661
# 20 WEST RIDGE 9144227
# 21 HERMOSA 3028810
# 22 AVONDALE 5136605
# 23 LOGAN SQUARE 9295521
# 24 HUMBOLDT PARK 9334890
# 25 WEST TOWN 11850755
# 26 AUSTIN 18511300
# 27 WEST GARFIELD PARK 3353110
# 28 EAST GARFIELD PARK 5005851
# 29 NEAR WEST SIDE 14724107
# 30 NORTH LAWNDALE 8313567
# 31 UPTOWN 6047440
# 32 SOUTH LAWNDALE 11891299
# 33 LOWER WEST SIDE 7576149
# 34 NEAR SOUTH SIDE 4623602
# 35 ARMOUR SQUARE 2579489
# 36 NORWOOD PARK 11839116
# 37 NEAR NORTH SIDE 7123217
# 38 LOOP 4304581
# 39 SOUTH SHORE 7600312
# 40 CHATHAM 7647583
# 41 AVALON PARK 3237793
# 42 SOUTH CHICAGO 8664835
# 43 BURNSIDE 1578918
# 44 CALUMET HEIGHTS 4535904
# 45 ROSELAND 12477753
# 46 NORTH CENTER 5300414
# 47 PULLMAN 5481215
# 48 SOUTH DEERING 28222390
# 49 EAST SIDE 7732987
# 50 WEST PULLMAN 9231061
# 51 RIVERDALE 9140344
# 52 HEGEWISCH 13559960
# 53 GARFIELD RIDGE 10952400
# 54 ARCHER HEIGHTS 5195326
# 55 BRIGHTON PARK 7050570
# 56 MCKINLEY PARK 3663260
# 57 LAKE VIEW 8102329
# 58 BRIDGEPORT 5415321
# 59 NEW CITY 12507893
# 60 WEST ELSDON 3038932
# 61 GAGE PARK 5693468
# 62 CLEARING 6609603
# 63 WEST LAWN 7647276
# 64 CHICAGO LAWN 9130323
# 65 WEST ENGLEWOOD 8170431
# 66 ENGLEWOOD 7957145
# 67 GREATER GRAND CROSSING 9183452
# 68 LINCOLN PARK 8204655
# 69 ASHBURN 12584519
# 70 AUBURN GRESHAM 9760656
# 71 BEVERLY 8247709
# 72 WASHINGTON HEIGHTS 7398212
# 73 MOUNT GREENWOOD 7021929
# 74 MORGAN PARK 8535503
# 75 OHARE 34379638
# 76 EDGEWATER 4501053
# 77 EDISON PARK 2939134
# end of script #
According to the documentation for ?sp::Polygons-class
, the area
slot within each polygon is not the actual area.
Object of class "numeric"; the gross total planar area of the Polygon list but not double-counting holes (changed from 0.9-58 - islands are summed, holes are ignored rather than subtracted); these values are used to make sure that polygons of a smaller area are plotted after polygons of a larger area, does not respect projection as objects of this class have no projection defined.
It seems more like a placeholder than the actual value you are seeking, which is why I used the geosphere::areaPolygon()
function to obtain that value.
The OP's data was made available via a GitHub repository. Down below is the same logic from above, only using different data.
# install necessary packages
install.packages( pkgs = c( "geosphere", "sp", "rgdal" ) )
# load necessary packages
library( geosphere )
library( sp )
library( rgdal )
# load necessary data
# download a .zip file of the GitHub repository
download.file( url = "https://github.com/LuisSevillano/lineas-limite/archive/master.zip"
, destfile = "lineas-limite-master.zip"
) #39.2MB ~ 1 minute DL time on my crappy wifi
# unzip the entire GitHub repository
unzip( zipfile = "lineas-limite-master.zip" )
# set working directory to be
# inside the sub-folder where
# the shapefiles of interest are located
setwd( dir = "/Users/cristiannuno/Downloads/lineas-limite-master/recintos_municipales_inspire_peninbal_etrs89/" )
# create spatial polygon data frame
lineas_limite <- rgdal::readOGR( dsn = getwd()
, layer = "recintos_municipales_inspire_peninbal_etrs89"
, stringsAsFactors = FALSE
)
# store each polygon's boundaries
list.of.polygon.boundaries <- sapply( X = slot( object = lineas_limite, name = "polygons" )
, FUN = function(i)
sp::coordinates( obj = slot( object = i, name = "Polygons")[[1]] )
)
# calculate each polygon's area in square meters
lineas_limite@data$Square_Meter_Area <- sapply( X = list.of.polygon.boundaries
, FUN = geosphere::areaPolygon
)
# View the first six results
head(
lineas_limite@data[ c( "NAMEUNIT", "Square_Meter_Area" ) ]
)
# NAMEUNIT Square_Meter_Area
# 0 Abla 45244459
# 1 Abrucena 83677254
# 2 Adra 89694443
# 3 Albánchez 35133840
# 4 Alboloduy 69734542
# 5 Pradejón 31755520
# end of script #
I encourage you to read through the ?sp::Polygons-class
documentation, as well as the SpatialPointsDataFrame properties and operators in R to learn more about spatial polygons and how to use them in R.
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rgdal_1.2-16 geosphere_1.5-7 sp_1.2-7
loaded via a namespace (and not attached):
[1] Rcpp_0.12.15 lattice_0.20-35 digest_0.6.14 mime_0.5
[5] grid_3.4.3 R6_2.2.2 xtable_1.8-2 magrittr_1.5
[9] leaflet_1.1.0.9000 tools_3.4.3 htmlwidgets_1.0 crosstalk_1.0.0
[13] shiny_1.0.5 httpuv_1.3.5 yaml_2.1.16 compiler_3.4.3
[17] htmltools_0.3.6
$citation
To cite RStudio in publications use:
RStudio Team (2016). RStudio: Integrated Development for R. RStudio,
Inc., Boston, MA URL http://www.rstudio.com/.
A BibTeX entry for LaTeX users is
@Manual{,
title = {RStudio: Integrated Development Environment for R},
author = {{RStudio Team}},
organization = {RStudio, Inc.},
address = {Boston, MA},
year = {2016},
url = {http://www.rstudio.com/},
}
$mode
[1] "desktop"
$version
[1] ‘1.1.419’