my data structure is here:
str(data_df)
'data.frame': 62630 obs. of 16 variables:
$ No : int 0 1 2 3 4 5 6 7 8 9 ...
$ UTC : POSIXct, format: "2022-06-09 00:08:51" "2022-06-09 00:08:52" ...
$ Temperature[C]: num 20 20 20 20 20.1 ...
$ Humidity[%] : num 57.4 56.7 56 55.3 54.7 ...
$ TVOC[ppb] : int 0 0 0 0 0 0 0 0 0 0 ...
$ eCO2[ppm] : int 400 400 400 400 400 400 400 400 400 400 ...
$ Raw_H2 : int 12306 12345 12374 12390 12403 12419 12432 12439 12448 12453 ...
$ Raw_Ethanol : int 18520 18651 18764 18849 18921 18998 19058 19114 19155 19195 ...
$ Pressure[hPa] : num 940 940 940 940 940 ...
$ PM1.0 : num 0 0 0 0 0 0 0 0 0 0.9 ...
$ PM2.5 : num 0 0 0 0 0 0 0 0 0 3.78 ...
$ NC0.5 : num 0 0 0 0 0 0 0 0 0 0 ...
$ NC1.0 : num 0 0 0 0 0 ...
$ NC2.5 : num 0 0 0 0 0 0 0 0 0 2.78 ...
$ CNT : int 0 1 2 3 4 5 6 7 8 9 ...
$ Fire_Alarm : int 0 0 0 0 0 0 0 0 0 0 ...
The records observed by seconds in 6/8, 9, 10, 13.
I want to make 60 records(60 seconds) to 1 record(1 minute)
for example) 2022-06-09 00:08:00 ~ 2022-06-09 00:08:59 to 1 record with
making 60 records to one mean record for every variable
like mean(Temperature), mean(Humidity)... and so on.
What should I do? It is not fully continuous records.
Create an auxiliary variable for the minutes with format
and aggregate by it. Here is a dplyr
solution.
# make test data
set.seed(2022)
start <- Sys.time() - lubridate::days(2)
end <- Sys.time()
UTC <- seq(start, end, by = "1 sec")
data_df <- data.frame(
No = seq_along(UTC) - 1L,
UTC,
A = rnorm(length(UTC)),
B = runif(length(UTC)),
X = sample(4, length(UTC), TRUE)
)
suppressPackageStartupMessages(library(dplyr))
data_df %>%
mutate(UTC_mins = format(UTC, "%Y-%m-%d %H-%M")) %>%
group_by(UTC_mins) %>%
summarise(across(A:X, ~ mean(.x, na.rm = TRUE)))
#> # A tibble: 2,881 × 4
#> UTC_mins A B X
#> <chr> <dbl> <dbl> <dbl>
#> 1 2022-10-24 18-15 -0.113 0.482 2.04
#> 2 2022-10-24 18-16 0.166 0.510 2.58
#> 3 2022-10-24 18-17 0.118 0.553 2.6
#> 4 2022-10-24 18-18 -0.364 0.438 2.43
#> 5 2022-10-24 18-19 0.0428 0.554 2.65
#> 6 2022-10-24 18-20 -0.183 0.492 2.47
#> 7 2022-10-24 18-21 -0.0670 0.487 2.47
#> 8 2022-10-24 18-22 0.0950 0.537 2.65
#> 9 2022-10-24 18-23 -0.0489 0.541 2.22
#> 10 2022-10-24 18-24 0.0241 0.492 2.65
#> # … with 2,871 more rows
Created on 2022-10-26 with reprex v2.0.2
And a base R solution.
UTC_mins <- format(data_df$UTC, "%Y-%m-%d %H-%M")
agg <- aggregate(. ~ UTC_mins, data_df[-(1:2)], mean, na.rm = TRUE)
rm(UTC_mins)
head(agg)
#> UTC_mins A B X
#> 1 2022-10-24 18-19 -0.05475728 0.4931293 2.095238
#> 2 2022-10-24 18-20 0.24564784 0.5239101 2.700000
#> 3 2022-10-24 18-21 -0.19414037 0.5169572 2.500000
#> 4 2022-10-24 18-22 -0.16612046 0.4793936 2.566667
#> 5 2022-10-24 18-23 0.03426193 0.5246618 2.616667
#> 6 2022-10-24 18-24 -0.19578060 0.5014733 2.416667
Created on 2022-10-26 with reprex v2.0.2