Search code examples
rdatetimedplyrtidyversetidyr

Group by date then summarize and transpose data


I have a data set that looks like this:

Date hh00 hh01 hh02 hh03 hh04 hh05
2021-09-01 21 0 51 12 65 96
2021-09-01 0 13 12 34 65 34
2021-09-03 12 0 0 1 61 96
2021-09-03 45 12 51 12 11 0
2021-09-05 10 0 51 98 23 96

Where the date column is the date and the hh columns stand for the hour of observation for each day. So in the first row on 2021-09-01 there is an observation of 21 at 00:00 hours and observation of 0 at 01:00 hours etc.

I'd like transform it get a data set containing the sum of the each column by the date and hour, then organized by the datetime. So it looks like this:

Date count
2021-09-01 00:00 21
2021-09-01 01:00 13
2021-09-01 02:00 63
2021-09-01 03:00 36
2021-09-01 04:00 130
2021-09-01 05:00 130
2021-09-03 00:00 57
2021-09-03 01:00 12
2021-09-03 02:00 51
2021-09-03 03:00 13
2021-09-03 04:00 72
2021-09-03 05:00 96

Solution

  • library(tidyverse) 
    library(lubridate) # loaded with tidyverse 2.0.0+
    df |>
      pivot_longer(-Date) |>
      mutate(Date = ymd_h(paste(Date, parse_number(name)))) |>
      summarize(value = sum(value), .by = Date)
    
       Date                value
       <dttm>              <int>
     1 2021-09-01 00:00:00    21
     2 2021-09-01 01:00:00    13
     3 2021-09-01 02:00:00    63
     4 2021-09-01 03:00:00    46
     5 2021-09-01 04:00:00   130
     6 2021-09-01 05:00:00   130
     7 2021-09-03 00:00:00    57
     8 2021-09-03 01:00:00    12
     9 2021-09-03 02:00:00    51
    10 2021-09-03 03:00:00    13
    11 2021-09-03 04:00:00    72
    12 2021-09-03 05:00:00    96
    13 2021-09-05 00:00:00    10
    14 2021-09-05 01:00:00     0
    15 2021-09-05 02:00:00    51
    16 2021-09-05 03:00:00    98
    17 2021-09-05 04:00:00    23
    18 2021-09-05 05:00:00    96