Search code examples
rsumrowscalculated-columns

R - sum a set number of rows from same column in a different data frame


I have the following data frame:

    df <- data.frame( year = c(1985,1986,1987,1988,1989,1990,
                       1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,
                       2001,2002,2003,2004,2005,2006,2007,2008,2009, 2010,
                       2011,2012, 2013,2014,2015,2016,2017,2018,2019,2020),

                     value = c(0,5,10,2,6,7,3,4,5,9,10,6,8,7,3,5,2,10,9,6,5,10,4,7,8,10,
                        4,6,8,9,2,3,7,6,2,1))

I want to create a second data frame (df2) that consists of 20 years intervals from the previous data frame, i.e.

df2 <- data.frame(year=c("1985-2005", "1986-2006","1987-2007", "1988-2008","1989-2009",
                         "1990-2010", "1991-2011","1992-2002", "1993-2003","1994-2004",
                         "1995-2005", "1996-2006","1997-2007", "1998-2008", "1999-2009",
                         "2000-2020"))

Now the value for df2 should be the sum of value on df for 20 years intervals (i.e., for year "1985-2005" in df2, the value is the sum of values from 1985 until 2005 in df - Excel snapshot attached with final values)

enter image description here

How can I perform this calculation? Also any possible automation to define the year interval in df2 without having to type it?


Solution

  • A possible solution:

    library(tidyverse)
    
    df <- data.frame( year = c(1985,1986,1987,1988,1989,1990,
                           1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,
                           2001,2002,2003,2004,2005,2006,2007,2008,2009, 2010,
                           2011,2012, 2013,2014,2015,2016,2017,2018,2019,2020),
    
                         value = c(0,5,10,2,6,7,3,4,5,9,10,6,8,7,3,5,2,10,9,6,5,10,4,7,8,10,
                            4,6,8,9,2,3,7,6,2,1))
    
    df2 <- data.frame(year=c("1985-2005", "1986-2006","1987-2007", "1988-2008","1989-2009",
                             "1990-2010", "1991-2011","1992-2002", "1993-2003","1994-2004",
                             "1995-2005", "1996-2006","1997-2007", "1998-2008", "1999-2009",
                             "2000-2020"))
    
    df2 %>% 
      separate(year, into = c("y1", "y2"), sep="-", convert = T, remove = F) %>% 
      rowwise %>% 
      mutate(value = sum(df$value[df$year >= y1 & df$year <= y2])) %>% 
      select(-y1, -y2) %>% ungroup
    
    #> # A tibble: 16 × 2
    #>    year      value
    #>    <chr>     <dbl>
    #>  1 1985-2005   122
    #>  2 1986-2006   132
    #>  3 1987-2007   131
    #>  4 1988-2008   128
    #>  5 1989-2009   134
    #>  6 1990-2010   138
    #>  7 1991-2011   135
    #>  8 1992-2002    69
    #>  9 1993-2003    74
    #> 10 1994-2004    75
    #> 11 1995-2005    71
    #> 12 1996-2006    71
    #> 13 1997-2007    69
    #> 14 1998-2008    68
    #> 15 1999-2009    69
    #> 16 2000-2020   124