Search code examples
rdplyrtime-seriestibble

Adding extra rows to tibble /dataframes by ID combination with formula (or Time series like operations on tidy tibble)


Certain time-series operations seem to be less intuitive to complete in a tidy data form in R than they would be either with matrices arrays or with a wide form (with years one of the dimensions) eg in Excel. For example, I would like to add 2035 data as a function of previous emissions Emissions_2035=Emissions_2030+(Emissions_2030-Emissions_2025) but in the 'tidy' data form below. ((Or I might want a 'lagged operator' for tibbles with years as one of the ID variables.)) How could I create an easy consistent way of adding 2035 data for each sector/region combination?

Do I need some sort of time series object to do this?

Emissions = tribble(
~Sector, ~CountryOrRegion,~Year, ~Emissions,
"Transport","Africa","2025","452627",
"Transport","Africa","2030","546313",
"Buildings","Africa","2025","52627",
"Buildings","Africa","2030","46313",
"Transport","Europe","2025","652627",
"Transport","Europe","2030","746313",
#etc...
)

Desired output:

Emissions = tribble(
~Sector, ~CountryOrRegion,~Year, ~Emissions,
"Transport","Africa","2025","452627",
"Transport","Africa","2030","546313",
"Transport","Africa","2035","641234",
"Buildings","Africa","2025","52627",
"Buildings","Africa","2030","46313",
"Buildings","Africa","2035","41234",
"Transport","Europe","2025","652627",
"Transport","Europe","2030","746313",
"Transport","Europe","2035","841234",
#etc...

)

Solution

  • You can try -

    library(dplyr)
    
    #change character to numbers
    Emissions <- Emissions %>% type.convert(as.is = TRUE) 
    
    Emissions %>%
      arrange(Sector, CountryOrRegion, Year) %>%
      group_by(Sector, CountryOrRegion) %>%
      summarise(Year = 2035,
                Emissions = last(Emissions) + diff(Emissions), .groups = 'drop')  %>%
      bind_rows(Emissions) %>%
      arrange(Sector, CountryOrRegion, Year)