I use a raster file that has global application rate (which is in kg/hectare) of glyphosate to soybeans in 2015. I want to calculate the mean application rate for each country and also get the total application in kg.
I have tried to extract the data but when I check it it doesn't add up so I would need some help on where I am going wrong.
Here is my code:
## Load libraries and read data ----
library(raster)
library(tidyverse)
library(maptools) # To get the borders of countries
# Border of countries
data(wrld_simpl)
# Make a data frame to put my numbers on the corresponding countries
world_data_empty = data.frame(country = wrld_simpl$NAME, iso3c = wrld_simpl$ISO3)
# This is a file with the application rate of glyphosate to soybeans in 2015 in kg/ha
raster_path = "https://raw.github.com/hansronald/Pesticide-data/master/APR_Soybean_Glyphosate_2015_L.tif"
sqm_to_ha_conversion = 0.0001
rast <- raster(raster_path)
## Extract the raster data ----
# Remove all the negative values of the raster
# This is because negative values correspond to things like water and I dont want them to count when adding, replace with NAs
# Then trim to remove NAs
rast_positive = rast
rast_positive <- clamp(rast, lower=0, useValues=FALSE)
rast_trimmed = trim(rast_positive)
# Multiply the kg/ha with the area of each cell to get the total kgs
rast_total_pesticide_application = rast_trimmed * area(rast_trimmed)
# Get the mean applcation rate (kg/ha) for each country
mean_application_rate_extract = raster::extract(rast_trimmed, wrld_simpl, fun = mean, na.rm = TRUE) # sp = T for keeping original dataframe
# Get the total applcation rate (kg/ha) for each country
total_application_extract = raster::extract(rast_total_pesticide_application, wrld_simpl, fun = sum, na.rm = TRUE)
# Get the total area
total_area_extract = raster::extract(area(rast_trimmed), wrld_simpl, fun = sum, na.rm = TRUE)
## Create the data frame ----
# Put the mean pesticide use per country in a dataframe and name the column after the pesticide
# NaNs created because all raster cells are NA in country polygon, replace with 0
mean_application_rate_extract[is.nan(mean_application_rate_extract)] = 0
mean_application_rate = data.frame(mean_application_rate = mean_application_rate_extract)
total_application = data.frame(total_application = total_application_extract)
total_area = data.frame(total_area = total_area_extract)
world_data = data.frame(world_data_empty, mean_application_rate, total_application, total_area) %>%
as_tibble()
# Check calculations by selecting a few countries and multiplying apr*area
world_data %>%
filter(iso3c %in% c("CHN", "BRA", "USA")) %>%
mutate(total_application_calc = mean_application_rate * total_area)
Here is the output
country iso3c mean_application_rate total_application total_area total_application_calc
<fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 Brazil BRA 2.00 4253187. 84469. 168653.
2 China CHN 2.09 9153007. 93254. 194736.
3 United States USA 1.93 5070446. 93889. 181164.
So there are a few problems here. First is that total_application_calc should be equal to total application (as it is the application rate (kg/ha) multiplied by total area (ha).
But the problem is also that the total application seem to be of with at lest one order of magnitude. According to this data the total application of glyphosate on soybeans in 2014 was 122,473,987 pounds which is equal to 55,553,266 kg compared to the 5,070,446 that I get from this dataset. It's ok if it is slightly off as they are different sources with different assumptions, but not this much.
Can anyone help where I might have done things wrong?
I have simplified your code a bit, and I think the numbers make more sense now. I cannot speak to the external validity question.
library(raster)
library(maptools)
data(wrld_simpl)
r <- raster("https://raw.github.com/hansronald/Pesticide-data/master/APR_Soybean_Glyphosate_2015_L.tif")
r <- clamp(r, lower=0, useValues=FALSE)
# area in ha
a <- area(r) * 100
mean_app <- raster::extract(r, wrld_simpl, fun = mean, na.rm = TRUE)
rtot <- r * a
tot_app <- raster::extract(rtot, wrld_simpl, fun = sum, na.rm = TRUE)
I think you made a mistake here. You need to use the cells that are not NA only.
rarea <- mask(a, r)
tot_area <- raster::extract(rarea, wrld_simpl, fun = sum, na.rm = TRUE)
## not
## tot_area <- raster::extract(area(r), wrld_simpl, fun = sum, na.rm = TRUE)
Inspect the results
w <- data.frame(country = wrld_simpl$NAME, iso3c = wrld_simpl$ISO3,
mean_app=mean_app, tot_app=tot_app, tot_area=tot_area)
w$tot_calc <- w$tot_area * w$mean_app
w[is.na(w)] <- 0
w[w$iso3c %in% c("CHN", "BRA", "USA"), ]
# country iso3c mean_app tot_app tot_area tot_calc
#21 Brazil BRA 1.996631 425318662 213982181 427243427
#30 China CHN 2.088219 915300667 439036703 916804725
#209 United States USA 1.929559 507044556 263544939 508525402
global_app <- cellStats(rtot, "sum")
global_app
# 2375398749
sum(w$tot_app)
# 2367120358