Search code examples
rdataframesumcol

Sum column in a DataFrame in R


I am trying to add a sum column to a large file that has dates in it. I want to sum every month and add a column to the right of the last column of that month.

Below is a reproducible example:

df <- data.frame("6Jun06" = c(4, 5, 9),
    "13Jun06" = c(4, 5, 9),
    "20Jun06" = c(4, 5, 9),
    "03Jul16" = c(1, 2, 3),
    "09Jul16" = c(1, 2, 3),
    "01Aug16" = c(1, 2, 5))

So in this case I would need to have three columns (after Jun, Jul, and Aug).

  X6.Jun.06 X13.Jun.06 X20.Jun.06 Jun.Sum X03.Jul.16 X09.Jul.16 Jul.Sum X01.Aug.16 Aug.Sum
1         4          4          4     Sum          1          1     Sum          1     Sum
2         5          5          5     Sum          2          2     Sum          2     Sum
3         9          9          9     Sum          3          3     Sum          5     Sum

I am not sure how to sum every month individually. I know there are build-in sum functions but the functions that I tried do not fit to my problem because they just do a general sum.


Solution

  • If you are new to R, a good start is taking a look at the dplyr ecosystem (as well as other packages by Hadley Wickham).

    library(dplyr)
    library(tidyr)
    
    df %>%
       mutate(id = 1:nrow(df)) %>%
       gather(date, value, -id) %>%
       mutate(Month = month.abb[apply(sapply(month.abb, function(mon) {grepl(mon, .$date)}), 1, which)]) %>%
       group_by(id, Month) %>%
       summarize(sum = sum(value)) %>%
       spread(Month, sum) %>%
       left_join(mutate(df, id = 1:nrow(df)), .) %>%
       select(-id)