This is the head()
of my dataset:
df <- data.frame(
X = c(
243813.672143309, 243820.16680888, 243819.847679243, 243816.851755806,
243814.016524682, 243817.173014157
),
Y = c(
717413.771532459, 717412.74899267, 717412.77789073, 717414.049964481,
717415.983508272, 717414.873097992
),
T = as.POSIXct(
c(
"2021-04-01 21:30:06.186", "2021-04-01 21:30:14.186",
"2021-04-01 21:30:22.186", "2021-04-01 21:30:30.185",
"2021-04-01 21:30:38.185", "2021-04-01 21:30:46.185"
),
tz = "GMT"
),
sp = c(
0, 6.57466869906985, 0.320435364660776, 3.25480089593961, 3.43178191624026,
3.34610770929176
),
ta = c(0, 0, 0.0658546845459325, 0.311226675793708, 0.196989706737039, 0.260257380057078),
row.names = 1688614:1688619
)
My objective is to segment the time-series by T
(time) attribute (say every 3 minutes) to calculate mean, standard deviation for sp
and ta
attributes in each chunk.
I don't really have a working code to achieve this, though I was thinking on the lines of looping over a sequence from head(df$T,1)
to tail(df$T,1)
separated by 3 min and extracting the records in every segment to calculate mean, std for certain columns. But I suppose this' not the best way to approach this problem in R.
Any help is appreciated. Using R 4.2.1.
You could do with cut()
.
df |>
group_by(cut(T, breaks="3 min")) |>
summarise(across(c(sp,ta), list(mean=mean, sd=sd)))
By the way, your sample data only includes 09:30, so let me show an example with flights
.
library(nycflights13) # import library for flights dataset.
# Mutate datetime variable for calculating 3 minute interval.
flights <- flights |>
mutate(datetime = as.POSIXct(
paste0(substr(time_hour,1,14),
minute,
substr(time_hour,17, 19))))
flights |>
group_by(cut(datetime, breaks="3 min")) |>
summarise(across(c(air_time, distance), list(mean = ~mean(., na.rm=T),
sd = ~sd(., na.rm=T))))
output
cut(datetime, breaks = "3 min") air_time_mean air_time_sd distance_mean distance_sd
1 2013-01-01 05:15:00 227.000 NA 1400.000 NA
2 2013-01-01 05:27:00 227.000 NA 1416.000 NA
3 2013-01-01 05:39:00 160.000 NA 1089.000 NA
4 2013-01-01 05:45:00 183.000 NA 1576.000 NA
5 2013-01-01 05:57:00 97.000 74.95332 453.000 376.1808
6 2013-01-01 06:00:00 196.875 100.59548 1273.941 724.5070
The data is not enough to show all mean
and sd
by 3 minutes though.