Search code examples
rtime-seriesposixct

Calculate mean and std of certain attributes for every segment of a time-series in R


This is the head() of my dataset:

df <- data.frame(
  X = c(
    243813.672143309, 243820.16680888, 243819.847679243, 243816.851755806,
    243814.016524682, 243817.173014157
  ),
  Y = c(
    717413.771532459, 717412.74899267, 717412.77789073, 717414.049964481,
    717415.983508272, 717414.873097992
  ),
  T = as.POSIXct(
    c(
      "2021-04-01 21:30:06.186", "2021-04-01 21:30:14.186",
      "2021-04-01 21:30:22.186", "2021-04-01 21:30:30.185",
      "2021-04-01 21:30:38.185", "2021-04-01 21:30:46.185"
    ),
    tz = "GMT"
  ),
  sp = c(
    0, 6.57466869906985, 0.320435364660776, 3.25480089593961, 3.43178191624026,
    3.34610770929176
  ),
  ta = c(0, 0, 0.0658546845459325, 0.311226675793708, 0.196989706737039, 0.260257380057078),
  row.names = 1688614:1688619
)

First 6 records

My objective is to segment the time-series by T (time) attribute (say every 3 minutes) to calculate mean, standard deviation for sp and ta attributes in each chunk.

I don't really have a working code to achieve this, though I was thinking on the lines of looping over a sequence from head(df$T,1) to tail(df$T,1) separated by 3 min and extracting the records in every segment to calculate mean, std for certain columns. But I suppose this' not the best way to approach this problem in R.

Any help is appreciated. Using R 4.2.1.


Solution

  • You could do with cut().

    df |> 
      group_by(cut(T, breaks="3 min")) |> 
      summarise(across(c(sp,ta), list(mean=mean, sd=sd)))
    

    By the way, your sample data only includes 09:30, so let me show an example with flights .

    library(nycflights13) # import library for flights dataset.
    
    # Mutate datetime variable for calculating 3 minute interval.
    flights <- flights |>
      mutate(datetime = as.POSIXct(
        paste0(substr(time_hour,1,14), 
               minute,
               substr(time_hour,17, 19))))
    
    flights |> 
      group_by(cut(datetime, breaks="3 min")) |> 
      summarise(across(c(air_time, distance), list(mean = ~mean(., na.rm=T),
                                                sd = ~sd(., na.rm=T))))
    

    output

     cut(datetime, breaks = "3 min") air_time_mean air_time_sd distance_mean distance_sd
    1             2013-01-01 05:15:00       227.000          NA      1400.000          NA
    2             2013-01-01 05:27:00       227.000          NA      1416.000          NA
    3             2013-01-01 05:39:00       160.000          NA      1089.000          NA
    4             2013-01-01 05:45:00       183.000          NA      1576.000          NA
    5             2013-01-01 05:57:00        97.000    74.95332       453.000    376.1808
    6             2013-01-01 06:00:00       196.875   100.59548      1273.941    724.5070
    

    The data is not enough to show all mean and sd by 3 minutes though.