Search code examples
rdatetimesubsetsegment

How can I separate time-series data into segments of continuous data according to a range of values in another column


I would like to separate a time series data into different segments (each being a different dataframe) according to the values of another column. For example:

#Generate a data frame of hourly precipitation and water level. 
install.packages("lubridate")  
library(lubridate)
df<-data.frame(date_time=ymd_hms(seq(c(ISOdate(2000,3,20)), by = "hour", length.out = 365)), precip= sample(0:10,365,replace=T), water_level=sample(-50:50,365,replace=T)))

I would like to make subsets of the time series data for when the water level is negative, keeping the date time value as it is (parsed using lubridate) along with the water level and precipitation variables for that continuous time range.


Solution

  • We can create a variable to group between positive and negative values, and split. In your case, we create a logical vector df$water_level >= 0 with greater and less than 0 values. The way to create sequential groups between them is to take the cumulative sum of the difference, when that difference is NOT 0, meaning that the next value is FALSE (or TRUE).

    split(df, cumsum(c(TRUE, diff(df$water_level >= 0) != 0)))
    

    which gives,

    $`1`
                date_time precip water_level
    1 2000-03-20 12:00:00      8          45
    
    $`2`
                date_time precip water_level
    2 2000-03-20 13:00:00      9         -12
    
    $`3`
                date_time precip water_level
    3 2000-03-20 14:00:00      4           9
    4 2000-03-20 15:00:00      0          13
    5 2000-03-20 16:00:00      8          34
    
    $`4`
                date_time precip water_level
    6 2000-03-20 17:00:00      1         -20
    ...
    ...