Search code examples
rvariablessumaggregatelevels

summing rows by combining levels R


I would like to ask you a challenge on this data frame. I do not know how to aggregate() rows in summing by combining two levels of "year factor". Find a model of my df.

placette year                   SP1               SP2  ... SPX
1         1 2013                 43                 4      ...
2         2 2013                 30                 0      ...
3         3 2013                 23                 3      ... 
4         1 2014                  0                 0      ...
5         2 2014                  2                 2      ...  
6         3 2014                  5                 0
7         1 2015                 16                 3
8         2 2015                 16                 1
9         3 2015                 20                 0
10        1 2016                 54                 4
11        2 2016                 51                 2
12        3 2016                 51                 0
 

I need to sum values of variables SP1, SP2, to SPX by period. For example, for 2 years (2013+2014) and three years (2013+2014+2015)...

I will expect this :

placette period                 SP1               SP2    ... SPX
1         1 2(2013+2014)         43                 4    ... ...
2         1 3(13+14+15)          59                 7    ... ...
3         1 4 (13+14+15+16)     113                11    ... ...
4         2 2 (13+14)            32                 2    ... ...
5         ... ... ...            ...              ...    ... ...

in keeping my factor "site".

I am working on R and I want to resolve it.

Thank you so much to help me.

Regards, Thomas.


Solution

  • Using dplyr, we arrange by 'placette', 'year', grouped by 'placette', get the cumsum of variables whose names starts_with 'SP'

    library(dplyr)
    data %>% 
       arrange(placette, year) %>%
       group_by(placette) %>% 
       mutate_at(vars(starts_with("SP")), cumsum)
    # A tibble: 12 x 4
    # Groups:   placette [3]
    #  placette  year   SP1   SP2
    #      <int> <int> <int> <int>
    # 1        1  2013    43     4
    # 2        1  2014    43     4
    # 3        1  2015    59     7
    # 4        1  2016   113    11
    # 5        2  2013    30     0
    # 6        2  2014    32     2
    # 7        2  2015    48     3
    # 8        2  2016    99     5
    # 9        3  2013    23     3
    #10        3  2014    28     3
    #11        3  2015    48     3
    #12        3  2016    99     3
    

    data

    data <- structure(list(placette = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 
    1L, 2L, 3L), year = c(2013L, 2013L, 2013L, 2014L, 2014L, 2014L, 
    2015L, 2015L, 2015L, 2016L, 2016L, 2016L), SP1 = c(43L, 30L, 
    23L, 0L, 2L, 5L, 16L, 16L, 20L, 54L, 51L, 51L), SP2 = c(4L, 0L, 
    3L, 0L, 2L, 0L, 3L, 1L, 0L, 4L, 2L, 0L)), class = "data.frame",
    row.names = c(NA, 
    -12L))