Search code examples
rvariablesmoving-average

R moving average between data frame variables


I am trying to find a solution but haven't, yet. I have a dataframe structured as follows:

country City 2014 2015 2016 2017 2018 2019
France  Paris 23   34   54   12   23   21
US       NYC   1    2    2    12   95  54 

I want to find the moving average for every 3 years (i.e. 2014-16, 2015-17, etc) to be placed in ad-hoc columns.

country City 2014 2015 2016 2017 2018 2019 2014-2016  2015-2017  2016-2018  2017-2019
France  Paris 23   34   54   12   23   21    37          33.3      29.7        18.7  
US       NYC   1    2    2    12   95  54    etc           etc     etc          etc

Any hint?


Solution

  • 1) Using the data shown reproducibly in the Note at the end we apply rollmean to each column in the transpose of the data and then transpose back. We rollapply the appropriate paste command to create the names.

    library(zoo)
    
    DF2 <- DF[-(1:2)]
    cbind(DF, setNames(as.data.frame(t(rollmean(t(DF2), 3))), 
      rollapply(names(DF2), 3, function(x) paste(range(x), collapse = "-"))))
    

    giving:

      country  City 2014 2015 2016 2017 2018 2019 2014-2016 2015-2017 2016-2018 2017-2019
    1  France Paris   23   34   54   12   23   21 37.000000 33.333333  29.66667  18.66667
    2      US   NYC    1    2    2   12   95   54  1.666667  5.333333  36.33333  53.66667
    

    2) This could also be expressed using dplyr/tidyr/zoo like this:

    library(dplyr)
    library(tidyr)
    library(zoo)
    
    DF %>%
      pivot_longer(-c(country, City)) %>%
      group_by(country, City) %>%
      mutate(value = rollmean(value, 3, fill = NA),
        name = rollapply(name, 3, function(x) paste(range(x), collapse="-"), fill=NA)) %>%
      ungroup %>%
      drop_na %>%
      pivot_wider %>%
      left_join(DF, ., by = c("country", "City"))
    

    Note

    Lines <- "country City 2014 2015 2016 2017 2018 2019
    France  Paris 23   34   54   12   23   21
    US       NYC   1    2    2    12   95  54 "
    DF <- read.table(text = Lines, header = TRUE, as.is = TRUE, check.names = FALSE)