Search code examples
raggregatedate-conversion

How to convert 60 seconds data to 1 minute data? in R


my data structure is here:

str(data_df)
'data.frame':   62630 obs. of  16 variables:
 $ No            : int  0 1 2 3 4 5 6 7 8 9 ...
 $ UTC           : POSIXct, format: "2022-06-09 00:08:51" "2022-06-09 00:08:52" ...
 $ Temperature[C]: num  20 20 20 20 20.1 ...
 $ Humidity[%]   : num  57.4 56.7 56 55.3 54.7 ...
 $ TVOC[ppb]     : int  0 0 0 0 0 0 0 0 0 0 ...
 $ eCO2[ppm]     : int  400 400 400 400 400 400 400 400 400 400 ...
 $ Raw_H2        : int  12306 12345 12374 12390 12403 12419 12432 12439 12448 12453 ...
 $ Raw_Ethanol   : int  18520 18651 18764 18849 18921 18998 19058 19114 19155 19195 ...
 $ Pressure[hPa] : num  940 940 940 940 940 ...
 $ PM1.0         : num  0 0 0 0 0 0 0 0 0 0.9 ...
 $ PM2.5         : num  0 0 0 0 0 0 0 0 0 3.78 ...
 $ NC0.5         : num  0 0 0 0 0 0 0 0 0 0 ...
 $ NC1.0         : num  0 0 0 0 0 ...
 $ NC2.5         : num  0 0 0 0 0 0 0 0 0 2.78 ...
 $ CNT           : int  0 1 2 3 4 5 6 7 8 9 ...
 $ Fire_Alarm    : int  0 0 0 0 0 0 0 0 0 0 ...

The records observed by seconds in 6/8, 9, 10, 13.

I want to make 60 records(60 seconds) to 1 record(1 minute)

for example) 2022-06-09 00:08:00 ~ 2022-06-09 00:08:59 to 1 record with

making 60 records to one mean record for every variable

like mean(Temperature), mean(Humidity)... and so on.

What should I do? It is not fully continuous records.


Solution

  • Create an auxiliary variable for the minutes with format and aggregate by it. Here is a dplyr solution.

    # make test data
    set.seed(2022)
    start <- Sys.time() - lubridate::days(2)
    end <- Sys.time()
    UTC <- seq(start, end, by = "1 sec")
    data_df <- data.frame(
      No = seq_along(UTC) - 1L,
      UTC,
      A = rnorm(length(UTC)),
      B = runif(length(UTC)),
      X = sample(4, length(UTC), TRUE)
    )
    
    suppressPackageStartupMessages(library(dplyr))
    
    data_df %>%
      mutate(UTC_mins = format(UTC, "%Y-%m-%d %H-%M")) %>%
      group_by(UTC_mins) %>%
      summarise(across(A:X, ~ mean(.x, na.rm = TRUE)))
    #> # A tibble: 2,881 × 4
    #>    UTC_mins               A     B     X
    #>    <chr>              <dbl> <dbl> <dbl>
    #>  1 2022-10-24 18-15 -0.113  0.482  2.04
    #>  2 2022-10-24 18-16  0.166  0.510  2.58
    #>  3 2022-10-24 18-17  0.118  0.553  2.6 
    #>  4 2022-10-24 18-18 -0.364  0.438  2.43
    #>  5 2022-10-24 18-19  0.0428 0.554  2.65
    #>  6 2022-10-24 18-20 -0.183  0.492  2.47
    #>  7 2022-10-24 18-21 -0.0670 0.487  2.47
    #>  8 2022-10-24 18-22  0.0950 0.537  2.65
    #>  9 2022-10-24 18-23 -0.0489 0.541  2.22
    #> 10 2022-10-24 18-24  0.0241 0.492  2.65
    #> # … with 2,871 more rows
    

    Created on 2022-10-26 with reprex v2.0.2

    And a base R solution.

    UTC_mins <- format(data_df$UTC, "%Y-%m-%d %H-%M")
    agg <- aggregate(. ~ UTC_mins, data_df[-(1:2)], mean, na.rm = TRUE)
    rm(UTC_mins)
    
    head(agg)
    #>           UTC_mins           A         B        X
    #> 1 2022-10-24 18-19 -0.05475728 0.4931293 2.095238
    #> 2 2022-10-24 18-20  0.24564784 0.5239101 2.700000
    #> 3 2022-10-24 18-21 -0.19414037 0.5169572 2.500000
    #> 4 2022-10-24 18-22 -0.16612046 0.4793936 2.566667
    #> 5 2022-10-24 18-23  0.03426193 0.5246618 2.616667
    #> 6 2022-10-24 18-24 -0.19578060 0.5014733 2.416667
    

    Created on 2022-10-26 with reprex v2.0.2