Search code examples
rtime-serieshour

Obtaining hourly average data from 1 minute dataframe


I have a data set in 1 minute interval, but I am looking for a way to convert it to hourly average. I am new to R programming for data analysis. Below is an example of how my data looks.

Please if there are other easy ways besides using R to solve this issue, kindly specify. I hope to hear from anyone soon

    TimeStamp         TSP PM10  PM2.5 PM1  T   RH
1   01/12/2022 14:08 44.3 14.2  6.97 3.34 32.9 53.2
2   01/12/2022 14:09 40.3 16.9  7.10 3.52 33.1 53.1
3   01/12/2022 14:10 36.5 15.6  7.43 3.64 33.2 53.1
4   01/12/2022 14:11 33.0 16.5  7.29 3.40 33.2 52.6
5   01/12/2022 14:12 41.3 18.2  7.73 3.41 33.3 52.9
6   01/12/2022 14:13 38.5 16.3  7.54 3.44 33.3 53.3
7   01/12/2022 14:14 38.5 18.5  6.80 3.14 33.2 53.6
8   01/12/2022 14:15 30.7 17.1  6.86 3.33 33.2 53.7
9   01/12/2022 14:16 32.5 18.3  8.56 4.42 33.3 53.5
10  01/12/2022 14:17 26.4 15.6  9.34 4.70 33.4 53.0
11  01/12/2022 14:18 23.8 14.6  7.56 3.97 33.4 52.5
12  01/12/2022 14:19 18.1 11.4  6.15 3.08 33.4 51.7
13  01/12/2022 14:20 22.4 12.2  6.43 3.49 33.5 50.9
14  01/12/2022 14:21 17.9 12.9  6.03 3.15 33.6 50.9
15  01/12/2022 14:22 18.6 12.8  5.87 3.19 33.7 50.7
16  01/12/2022 14:23 22.3 10.7  5.49 2.74 33.7 50.6
17  01/12/2022 14:24 18.1  9.2  4.87 2.52 33.7 49.9
18  01/12/2022 14:25 19.2 13.0  5.12 2.65 33.7 50.2
19  01/12/2022 14:26 19.0 10.3  5.01 2.78 33.9 50.0
20  01/12/2022 14:27 20.0 10.3  4.78 2.57 34.0 49.4
21  01/12/2022 14:28 14.1  9.6  4.71 2.45 34.1 49.0
22  01/12/2022 14:29 16.1 10.3  4.83 2.68 34.1 48.9
23  01/12/2022 14:30 13.9 10.0  5.21 2.99 34.2 49.5
24  01/12/2022 14:31 27.3 11.5  5.90 2.94 34.2 49.7
25  01/12/2022 14:32 23.8 12.8  5.77 2.97 34.2 49.6
26  01/12/2022 14:33 19.3 12.4  5.92 3.29 34.3 49.6
27  01/12/2022 14:34 30.9 14.4  6.10 3.22 34.3 49.3
28  01/12/2022 14:35 30.5 15.0  5.73 2.98 34.3 49.9
29  01/12/2022 14:36 24.7 13.9  6.17 3.17 34.3 50.0
30  01/12/2022 14:37 27.0 12.3  6.16 3.14 34.2 50.2
31  01/12/2022 14:38 27.0 12.4  5.65 3.28 34.2 50.3
32  01/12/2022 14:39 22.2 12.5  5.51 3.10 34.2 50.2
33  01/12/2022 14:40 19.0 11.6  5.46 3.06 34.1 50.3
34  01/12/2022 14:41 24.3 14.3  5.45 3.01 34.1 50.2
35  01/12/2022 14:42 17.6 10.9  5.64 3.30 34.1 50.5
36  01/12/2022 14:43 20.9 10.1  5.80 3.26 34.0 51.0
37  01/12/2022 14:44 19.0 11.7  5.93 3.27 33.9 50.9
38  01/12/2022 14:45 25.7 15.6  6.20 3.40 33.9 51.1
39  01/12/2022 14:46 20.1 14.4  6.08 3.39 34.0 51.3
40  01/12/2022 14:47 14.8 11.1  5.91 3.44 34.1 50.9

I have tried several methods I got via my research but non seems to work for me. Below are the codes I have tried

ref.data.hourly <- ref.data %>%
  group_by(hour = format (as.POSIXct(cut(TimeStamp, break = "hour")), "%H")) %>%
  summarise(meanval = mean(val, na.rm = TRUE))

I have also tried this

ref.data$TimeStamp <- as.POSIXct(ref.data$TimeStamp, format = "%d/%m/%Y %H:%M")
ref.data.xts$TimeStamp <- NULL

ref.data$TimeStamp <- strptime(ref.data$TimeStamp, "%d/%m/%Y %H:%M")

ref.data$group <- cut(ref.data$TimeStamp, breaks = "hour")

Solution

  • Your first attempt seems sensible to me. Lacking further info about your data or a specific error message, I assume the problem is handling the date-time formatting (or actually using cut() with date-time values).

    A workaround is to convert the dates to character (if they aren't yet) and then just omit the minutes. Given that as.character(ref.data$timeStamp) is consistently formatted like e.g. 01/12/2022 14:08, you can do the following:

    ref.data.hourly <- ref.data %>%
      mutate(hour_grps = substr(as.character(TimeStamp), 1, 13)) %>%
      group_by(hour_grps) %>%
      summarise(meanval = mean(val, na.rm = TRUE))
    

    I don't think this is good practice because it will break if you use the same code on slightly different formatted data. For instance, if the code were used on a computer with different locale, the date-time formatting used with as.character() may change. So please consider this a quick fix, not a permanent solution.