Search code examples
rdplyrtime-seriestidyverseaggregation

Aggregating data at 15 minutes interval based on date and hour in R


I have data as follows

Time <- c("2021-08-30 7:24","2021-08-30 7:30","2021-08-30 7:54","2021-08-30 8:16","2021-08-30 8:27","2021-08-30 8:22","2021-08-31 2:39","2021-08-31 2:44","2021-08-31 2:50","2021-08-31 2:56","2021-08-31 7:42","2021-08-31 7:45","2021-08-31 7:50","2021-08-31 6:02")
Distance_m <- c(162,162,162,162,162,162,162,157,150,137,122,102,78,42)
df <- data.frame(Time, Distance_m)
df
              Time Distance_m
1  2021-08-30 7:24        162
2  2021-08-30 7:30        162
3  2021-08-30 7:54        162
4  2021-08-30 8:16        162
5  2021-08-30 8:27        162
6  2021-08-30 8:22        162
7  2021-08-31 2:39        162
8  2021-08-31 2:44        157
9  2021-08-31 2:50        150
10 2021-08-31 2:56        137
11 2021-08-31 7:42        122
12 2021-08-31 7:45        102
13 2021-08-31 7:50         78
14 2021-08-31 6:02         42

I Want to sum the Distance_m based on 15 minutes intervals based on date and hour.

I am expecting the output as follows

Date    Hour    Time    Distance_m
2021-08-30  7   54  486
2021-08-30  8   30  486
2021-08-31  2   56  606
2021-08-31  6   2   344

So far I have tried

df <- tidyr::separate(df, Time, c("Date", "Time"), sep = " ")
df1<- df %>%
  mutate(Time = hm(Time)) %>%
  mutate(ttt= (lubridate::minute(Time) + lubridate::hour(Time) * 60)) %>%
  mutate(tt = floor(ttt/15) ) %>%
  group_by(tt) %>%
  summarize(Date = last(Date),Time = last(Time), Distance_m = sum(Distance_m))

But the output is a bit messy. I am hoping to find an efficient way as I am dealing with a huge data.

Thank you


Solution

  • Not exactly giving your expected results though, but perhaps usable. You can see if this fits your needs.

    library(data.table)
    setDT(df)
    
    df[, Time := ymd_hm(Time)]
    df[, groups := lubridate::round_date(Time, "15 minutes")]
    df[, .(Distance_m_sum = sum(Distance_m)), by = groups]
    
                   groups Distance_m_sum
    1: 2021-08-30 07:30:00            324
    2: 2021-08-30 08:00:00            162
    3: 2021-08-30 08:15:00            324
    4: 2021-08-30 08:30:00            162
    5: 2021-08-31 02:45:00            469
    6: 2021-08-31 03:00:00            137
    7: 2021-08-31 07:45:00            302
    8: 2021-08-31 06:00:00             42
    

    More extended example

    You have to define your quarters I think, there are with the lubridate approach three options, round_date, floor_date and ceiling_date. Rethinking my own example I would pick floor_date as 2021-08-30 7:24 falls in the 7:15-7:30 group. To see all variants:

    library(data.table)
    setDT(df)
    
    df[, Time := ymd_hm(Time)]
    df[, round_date := lubridate::round_date(Time, "15 minutes")]
    df[, floor_date := lubridate::floor_date(Time, "15 minutes")]
    df[, ceiling_date := lubridate::ceiling_date(Time, "15 minutes")]
    
    df[, .(Distance_m_sum = sum(Distance_m)), by = round_date]
                round_date Distance_m_sum
    1: 2021-08-30 07:30:00            324
    2: 2021-08-30 08:00:00            162
    3: 2021-08-30 08:15:00            324
    4: 2021-08-30 08:30:00            162
    5: 2021-08-31 02:45:00            469
    6: 2021-08-31 03:00:00            137
    7: 2021-08-31 07:45:00            302
    8: 2021-08-31 06:00:00             42
    
    df[, .(Distance_m_sum = sum(Distance_m)), by = floor_date]
            floor_date Distance_m_sum
    1: 2021-08-30 07:15:00            162
    2: 2021-08-30 07:30:00            162
    3: 2021-08-30 07:45:00            162
    4: 2021-08-30 08:15:00            486
    5: 2021-08-31 02:30:00            319
    6: 2021-08-31 02:45:00            287
    7: 2021-08-31 07:30:00            122
    8: 2021-08-31 07:45:00            180
    9: 2021-08-31 06:00:00             42
    
    df[, .(Distance_m_sum = sum(Distance_m)), by = ceiling_date]
              ceiling_date Distance_m_sum
    1: 2021-08-30 07:30:00            324
    2: 2021-08-30 08:00:00            162
    3: 2021-08-30 08:30:00            486
    4: 2021-08-31 02:45:00            319
    5: 2021-08-31 03:00:00            287
    6: 2021-08-31 07:45:00            224
    7: 2021-08-31 08:00:00             78
    8: 2021-08-31 06:15:00             42