(This is my first post in Stackoverflow, so please let me know if there's any basic mistakes I made in this question)
Hi,
I have a hourly-measured data of air quality over multiple days in R, and I would like to calculate the average air quality at specific time periods.
Here's a subset of my reproducible data. It is in xts format.
# Make a structure of data
dput(Air_sample[1:6,1:1])
# Create a data from the structure above.
Air <- structure(
c(2.6, 2, 2.2, 2.2, 1.6, 1.2),
class = c("xts", "zoo"),
index = structure(
c(
1078959600,
1078963200,
1079046000,
1079049600,
1079132400,
1079136000
),
tzone = "",
tclass = c("POSIXct",
"POSIXt")
),
.Dim = c(6L, 1L),
.Dimnames = list(NULL, c("True.CO")))
> Air
True.CO
2004-03-10 18:00:00 2.6
2004-03-10 19:00:00 2.0
2004-03-11 18:00:00 2.2
2004-03-11 19:00:00 2.2
2004-03-12 18:00:00 1.6
2004-03-12 19:00:00 1.2
I want to calculate average CO at specific time (ex. 6PM) from multiple days. So the result would be something like below.
Air_average <- data.frame("Time" = c("18:00","19:00"), "Average CO" = c(2.1333,1.8))
> Air_average
Time Average.CO
1 18:00 2.1333
2 19:00 1.8000
I tried different function via Googling such as "period.apply", "subset", "window", etc. But none of them seems to work.
Is there any way to do this?
Thank you.
You can use something like dplyr
to do grouping operation, and lubridate
to deal with dates. lubridate
has the hour
function which return only the hours.
I first convert your data into a data frame:
library(lubridate)
library(dplyr)
library(xts)
Air <- data.frame(Air) %>%
add_rownames(var = "time")
time True.CO
<chr> <dbl>
1 2004-03-11 00:00:00 2.6
2 2004-03-11 01:00:00 2
3 2004-03-12 00:00:00 2.2
4 2004-03-12 01:00:00 2.2
5 2004-03-13 00:00:00 1.6
6 2004-03-13 01:00:00 1.2
Because of my timezone, the hours are not the same than yours, but the code will be the same.
Air %>%
group_by(hour(time))%>%
summarise(mean(True.CO))
# A tibble: 2 x 2
`hour(time)` `mean(True.CO)`
<int> <dbl>
1 0 2.13
2 1 1.8