I'm trying to calculate landcover repartition for each US county. I have obtained NLCD for the Apache county using the FedData package (devtools version) and I'm using county shapefiles from the Census bureau.
The problem is that I get an area that is much larger than the official one and the one indicated in my shapefile, namely 51,000km^2 instead of 29,0000km^2 officially. There must be something I don't understand about the raster object but I'm a very confused after hours of websearching, any help appreciated.
The following describes the code used and the method used to calculate. The county data can be downloaded here: https://www2.census.gov/geo/tiger/TIGER2016/COUNTY/
The following code assumes the county shapefile is saved and unzipped.
#devtools::install_github("ropensci/FedData")
library(FedData)
library(rgdal)
library(dplyr)
#Get Apache polygone
counties<- readOGR('tl_2016_us_county/tl_2016_us_county.shp')
apache <- subset(counties,counties$GEOID=="04001")
# Get NCLD data
nlcd_data <- get_nlcd(apache,
year = 2011,
label = "Apache",
force.redo = TRUE)
nlcd_data #inspect the object, can see that number of cells is around 57 million
The calculated total area is too large.
# Calculating the landcover repartition in County
landcover<-data.frame(x2011 = values(nlcd_data)) #Number of rows corresponds to number of cells
landcover_freq<-table(landcover)
df_landcover <- as.data.frame(landcover_freq)
res <- df_landcover %>%
mutate(area_type_sqm = Freq*900,
area_type_km=area_type_sqm/1e6,
area_sqkm = sum(area_type_km))%>%
group_by(landcover)%>%
mutate(pc_land =round(100*area_type_km/area_sqkm,1))
head(arrange(res,desc(pc_land)))
# A tibble: 6 x 6
# Groups: landcover [6]
landcover Freq area_type_sqm area_type_km area_sqkm pc_land
<fct> <int> <dbl> <dbl> <dbl> <dbl>
1 52 33455938 30110344200 30110. 51107. 58.9
2 42 16073820 14466438000 14466. 51107. 28.3
3 71 5999412 5399470800 5399. 51107. 10.6
4 31 488652 439786800 440. 51107. 0.9
5 21 362722 326449800 326. 51107. 0.6
6 22 95545 85990500 86.0 51107. 0.2
## Total area calculated from raster is 51,107 square km
apache_area <- as.data.frame(apache) %>%
mutate(AREA=(as.numeric(ALAND)+as.numeric(AWATER))/1e6) %>%
select(AREA)
apache_area$AREA
29055.47 #Official area of apache county (in square km)
The difference doesn't seem large enough to justify the difference
apache <- spTransform(apache,proj4string(nlcd_data))
plot(nlcd_data)
plot(apache,add=TRUE)
The reason is that you get the data returned in the Mercator projection.
crs(nlcd_data)
CRS arguments:
+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 +units=m +nadgrids=@null +wktext
+no_defs
This coordinate reference system preserves shape and that is why it is used for web-mapping. It should not be used for most other purposes, as it is also notorious for distortion of area.
The take-home message is to never just trust the nominal resolution of a projected raster and assume that it is correct and/or constant. The reliable way to compute area is to use longitude/latitude coordinates, because these are, by definition, not distorted.
The reported spatial resolution is
res(nlcd_data)
[1] 30 30
So it is not surprising that you expected that the cells have an area of 30 x 30 = 900 m2. But the cells sizes are actually between 573 and 625 m2 for Apache county. Illustrated below.
First get the data
library(FedData)
counties <- raster::getData("GADM", country="USA", level=2)
apache <- subset(counties,counties$NAME_2=="Apache")
nlcd <- get_nlcd(apache, year = 2011, label = "nlcd_apache", force.redo = TRUE)
# move to terra
library(terra)
r <- rast(nlcd)
ap <- vect(apache)
# county boundaries to Mercator
apm <- project(ap, crs(r))
To compute the area of the grid cells I represent them as polygons. I first aggregate to get much larger cells to avoid getting too many small polygons (it would take very long, and perhaps crash R). I then transform the Mercator polygons to longitude/latitude, compute their true area, and transform back (just for consistent display purposes).
f <- 300
a <- aggregate(rast(r), f)
p <- as.polygons(a)
# compute area
g <- project(p, "+proj=longlat")
g$area <- round(expanse(g) / (f * f), 5)
# project back and plot
merc <- project(g, crs(r))
plot(merc, "area", border=NA)
lines(apm)
The map shows the approximate variation in the size of the original "900 m2" cells (between 573 and 625). This is not the case when I use the original data, as illustrated below.
library(terra)
# "filename" is the local file that has the nlcd data
usnlcd <- rast(filename)
crs(usnlcd, proj4=TRUE)
#[1] "+proj=aea +lat_0=23 +lon_0=-96 +lat_1=29.5 +lat_2=45.5 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
res(x)
#[1] 30 30
Note that +proj=aea
stands for the Albers Equal Area projection!
ap <- vect(apache)
apm <- project(ap, crs(usnlcd))
x <- crop(usnlcd, apm)
par(mfrow=c(1,2))
plot(x)
lines(apm)
# gg2 computed as above
plot(gg2, "area", border=NA)
As you can see, the cell area is indeed 900 m2, with only very little distortion, so small that it can be ignored.
You could transform the Mercator data back to the original +proj=aea
, but then you would have degraded the quality of the data twice. So that is a really bad idea. You could also account for the true cell area of each cell, but that is rather convoluted.
Finally, to get the area covered by each land cover class
m <- mask(x, apm)
f <- freq(m)
f <- data.frame(f)
f$area <- round(f$count * 900 / 1e6, 1)
# the next step is a bit tricky and will be done by `freq` in future versions
levs <- levels(m)[[1]]
f$class <- levs[f$value + 1]
Voila:
f[, c("class", "area")]
# class area
#1 Open Water 21.5
#2 Developed, Open Space 175.3
#3 Developed, Low Intensity 46.4
#4 Developed, Medium Intensity 9.7
#5 Developed, High Intensity 1.0
#6 Barren Land 232.5
#7 Deciduous Forest 44.6
#8 Evergreen Forest 7604.6
#9 Mixed Forest 2.0
#10 Shrub/Scrub 18262.8
#11 Herbaceuous 2514.4
#12 Hay/Pasture 1.9
#13 Cultivated Crops 0.0
#14 Woody Wetlands 38.8
#15 Emergent Herbaceuous Wetlands 53.6
And the total is as expected
sum(f$area)
#[1] 29009.1
PS. This problem has now been solved at the source --- but I hope this answer will remain useful for others using spatial data with a Mercator CRS.