Our weather station recorded temperature data at 15 minutes interval. I'd like to calculate hourly min
imum, daily max
imum and daily mean
temperatures. How can I do this in R programming language, preferably in dplyr
syntax?
My data looks like this
date temperature
1 2014-05-26 11:45:00 25.0
2 2014-05-26 12:00:00 25.2
3 2014-05-26 12:15:00 25.3
4 2014-05-26 12:30:00 25.1
5 2014-05-26 12:45:00 25.4
[...]
96 2014-05-27 11:30:00 26.3
97 2014-05-27 11:45:00 25.7
98 2014-05-27 12:00:00 24.9
99 2014-05-27 12:15:00 24.9
100 2014-05-27 12:30:00 26.0
Here is the reproducible example. The dput
function changes date format and mdy_hm won't parse it properly. I think this the reason some of the answers are not working for me.
df <- structure(list(date = structure(c(1401104700, 1401105600, 1401106500,
1401107400, 1401108300, 1401109200, 1401110100, 1401111000, 1401111900,
1401112800, 1401113700, 1401114600, 1401115500, 1401116400, 1401117300,
1401118200, 1401119100, 1401120000, 1401120900, 1401121800, 1401122700,
1401123600, 1401124500, 1401125400, 1401126300, 1401127200, 1401128100,
1401129000, 1401129900, 1401130800, 1401131700, 1401132600, 1401133500,
1401134400, 1401135300, 1401136200, 1401137100, 1401138000, 1401138900,
1401139800, 1401140700, 1401141600, 1401142500, 1401143400, 1401144300,
1401145200, 1401146100, 1401147000, 1401147900, 1401148800, 1401149700,
1401150600, 1401151500, 1401152400, 1401153300, 1401154200, 1401155100,
1401156000, 1401156900, 1401157800, 1401158700, 1401159600, 1401160500,
1401161400, 1401162300, 1401163200, 1401164100, 1401165000, 1401165900,
1401166800, 1401167700, 1401168600, 1401169500, 1401170400, 1401171300,
1401172200, 1401173100, 1401174000, 1401174900, 1401175800, 1401176700,
1401177600, 1401178500, 1401179400, 1401180300, 1401181200, 1401182100,
1401183000, 1401183900, 1401184800, 1401185700, 1401186600, 1401187500,
1401188400, 1401189300, 1401190200, 1401191100, 1401192000, 1401192900,
1401193800), tzone = "UTC", class = c("POSIXct", "POSIXt")),
temperature = c(25, 25.2, 25.3, 25.1, 25.4, 26, 25.9, 25.6,
26.8, 27.8, 26.8, 26, 26, 26.3, 27, 27, 26.2, 25.8, 24.9,
25.1, 26.3, 25.6, 25.3, 25.2, 25.1, 24.8, 24.7, 24, 23, 22.7,
22.5, 22.5, 22.2, 21.9, 21.5, 21.1, 20.8, 20.5, 20.3, 20.3,
20.2, 20, 19.8, 19.6, 19.2, 19.1, 19.1, 18.9, 18.8, 18.6,
18.3, 18.2, 18.2, 18.2, 18.1, 17.9, 17.8, 17.7, 17.8, 18,
18.1, 18, 18.1, 18.6, 18.7, 18.5, 18.3, 18.1, 18.1, 18.6,
18.8, 18.6, 18.6, 18.3, 18.2, 18, 17.8, 18, 18.2, 18.9, 19.8,
19.6, 19.5, 19.7, 20.2, 21.5, 22.4, 23, 24, 23.3, 23.2, 23.7,
24.5, 24.8, 24.9, 26.3, 25.7, 24.9, 24.9, 26)), row.names = c(NA,
-100L), class = c("tbl_df", "tbl", "data.frame"))
df$date = as.POSIXct(df$date)
I used the following code to convert data recorded at 15 minutes to daily, but I'd like to change to hourly.
library(dplyr)
library(lubridate)
df %>%
group_by(date = as.Date(date, "%m/%d/%Y")) %>% # how to group by hour?
summarise(min_temp = min(temperature, na.rm = TRUE),
max_temp = max(temperature, na.rm = TRUE),
mean_temp = mean(temperature, na.rm = TRUE)) %>%
ungroup()
An approach with round_date
to get hourly grouping. Using toy data.
(... to make sure your date is of class date
use: df$date <- as.POSIXct(df$date, format="%m/%d/%Y %I:%M %p")
)
EDIT, using posted df
library(dplyr)
library(lubridate)
df %>%
group_by(grp = cumsum(format(round_date(date, "hour"), "%H:%M") ==
format(date, "%H:%M"))) %>%
summarize(date = unique(format(date, "%m/%d/%Y %I:00 %p")),
max_temperature = max(temperature),
min_temerature = min(temperature),
mean_temperature = mean(temperature)) %>%
select(-grp)
# A tibble: 26 × 4
date max_temperature min_temerature mean_temperature
<chr> <dbl> <dbl> <dbl>
1 05/26/2014 11:00 AM 25 25 25
2 05/26/2014 12:00 PM 25.4 25.1 25.2
3 05/26/2014 01:00 PM 26.8 25.6 26.1
4 05/26/2014 02:00 PM 27.8 26 26.6
5 05/26/2014 03:00 PM 27 26.2 26.6
6 05/26/2014 04:00 PM 26.3 24.9 25.5
7 05/26/2014 05:00 PM 25.6 25.1 25.3
8 05/26/2014 06:00 PM 24.8 23 24.1
9 05/26/2014 07:00 PM 22.7 22.2 22.5
10 05/26/2014 08:00 PM 21.9 20.8 21.3
# … with 16 more rows
# ℹ Use `print(n = ...)` to see more rows
If you actually want a summarized table over all hours add one additional summarize
df %>%
group_by(grp = cumsum(format(round_date(date, "hour"), "%H:%M") ==
format(date, "%H:%M"))) %>%
summarize(Hour = unique(format(date, "%H:00")),
max_temperature = max(temperature),
min_temperature = min(temperature),
mean_temperature = mean(temperature)) %>%
select(-grp) %>%
summarize(across(ends_with("temperature"), mean), .by = Hour) %>%
arrange(Hour)
# A tibble: 24 × 4
Hour max_temperature min_temperature mean_temperature
<chr> <dbl> <dbl> <dbl>
1 00:00 18.6 18.2 18.3
2 01:00 18.2 17.8 18
3 02:00 18.1 17.7 17.9
4 03:00 18.7 18 18.4
5 04:00 18.5 18.1 18.2
6 05:00 18.8 18.6 18.7
7 06:00 18.3 17.8 18.1
8 07:00 19.8 18 18.7
9 08:00 20.2 19.5 19.8
10 09:00 24 21.5 22.7
# … with 14 more rows
# ℹ Use `print(n = ...)` to see more rows