Certain time-series operations seem to be less intuitive to complete in a tidy data form in R than they would be either with matrices arrays or with a wide form (with years one of the dimensions) eg in Excel. For example, I would like to add 2035 data as a function of previous emissions Emissions_2035=Emissions_2030+(Emissions_2030-Emissions_2025) but in the 'tidy' data form below. ((Or I might want a 'lagged operator' for tibbles with years as one of the ID variables.)) How could I create an easy consistent way of adding 2035 data for each sector/region combination?
Do I need some sort of time series object to do this?
Emissions = tribble(
~Sector, ~CountryOrRegion,~Year, ~Emissions,
"Transport","Africa","2025","452627",
"Transport","Africa","2030","546313",
"Buildings","Africa","2025","52627",
"Buildings","Africa","2030","46313",
"Transport","Europe","2025","652627",
"Transport","Europe","2030","746313",
#etc...
)
Desired output:
Emissions = tribble(
~Sector, ~CountryOrRegion,~Year, ~Emissions,
"Transport","Africa","2025","452627",
"Transport","Africa","2030","546313",
"Transport","Africa","2035","641234",
"Buildings","Africa","2025","52627",
"Buildings","Africa","2030","46313",
"Buildings","Africa","2035","41234",
"Transport","Europe","2025","652627",
"Transport","Europe","2030","746313",
"Transport","Europe","2035","841234",
#etc...
)
You can try -
library(dplyr)
#change character to numbers
Emissions <- Emissions %>% type.convert(as.is = TRUE)
Emissions %>%
arrange(Sector, CountryOrRegion, Year) %>%
group_by(Sector, CountryOrRegion) %>%
summarise(Year = 2035,
Emissions = last(Emissions) + diff(Emissions), .groups = 'drop') %>%
bind_rows(Emissions) %>%
arrange(Sector, CountryOrRegion, Year)