Search code examples
rdplyraggregate

R: Summarize rows per month


I have made a dataframe which has a column with dates and columns with numeric values. I want this dataframe to group itself by month and summerize all the numeric values from the other columns per corresponding month.

Here is my dataframe example:

capture.date  Test1  Test2  Test3
2016-03-18      0      1      1
2016-03-18      1      1      1
2016-03-20      2      1      1
2016-04-12      1      0      1

I already tried some code:

df %>% 
  group_by(capture.date) %>% 
  summarise_each(funs(sum))

and:

aggregate(df[2:4], by=df["capture.date"], sum)

but both of these options return dataframes which summarize by daily date instead of month. How can I make it summarize by month instead of by day?

desired output:

capture.date  Test1  Test2  Test3
2016-03         3      3      3     
2016-04         1      0      1

Solution

  • You can extract dates into the %Y-%m format in group_by() and use summarise_if() or summarise_at() to select which variables get be summed.

    (Confirm that capture.date is Date class)

    df %>%
      group_by(Date = strftime(capture.date, "%Y-%m")) %>%
      summarise_if(is.numeric, sum)
    
    # # A tibble: 2 x 4
    #   Date    Test1 Test2 Test3
    #   <chr>   <int> <int> <int>
    # 1 2016-03     3     3     3
    # 2 2016-04     1     0     1
    

    Update!

    Scoped verbs (⁠_if⁠, ⁠_at⁠, ⁠_all⁠) have been superseded by the use of pick() or across() in an existing verb.

    df %>%
      group_by(Date = strftime(capture.date, "%Y-%m")) %>%
      summarise(across(where(is.numeric), sum))