Search code examples
rdatetimetime-seriesopenair

R: Splitting a Time Series into custom seasons


I have a time Series DataFrame:

[https://www.dropbox.com/s/elaxfuvqyip1eq8/SampleDF.csv?dl=0][1]

My intention is to divide this DataFrame into different seasons according to:

  1. winter: Dec Jan Feb
  2. Pre-monsoon: Mar Apr May Jun15 (i.e. till 15th of June)
  3. Monsoon: 15Jun Jul Aug Sep (i.e. from 15th of June)
  4. Post-monsoon: Oct Nov.

I tried using openair package function

selectByDate()

But no luck yet. Being novice in R. Any help would be highly appreciated.

Thanks!


Solution

  • Please see the lubridate package which makes working with date/time a bit easier.

    For your problem, I guess you can use sapply:

    df["season"] = sapply(df["date"], assign_season)
    

    where, assign_season:

    assign_season <- function(date){
        # return a season based on date 
    
    }
    

    once you have seasons, then you can divide the dataframe easily:

    winter = subset(df, season == "winter")
    # and so on
    

    Sorry, I have to rush now, but can come back and finish this, if someone else hasn't answered already.

    EDIT:

    So, R does have a built in function cut, that can work on dates and split a vector based on date ranges.

    For your data, I did this like so:

    library(lubridate)
    library(dplyr)
    
    df = read.csv('SampleDF.csv')
    
    ## reformat date into POSIXct
    df <- df %>%
             mutate(date_reformat = as.POSIXct(date(mdy_hm(date))))
    
    ## define breaks & labels
    breaks = c("2014-12-01", "2015-03-01", "2015-06-15", "2015-10-01", "2015-12-01", "2016-03-01", "2016-06-15", "2016-10-01", "2016-12-01", "2017-03-01")
    labels = c("winter", "pre_monsoon", "monsoon", "post_monsoon", "winter", "pre_monsoon", "monsoon", "post_monsoon", "winter")
    df["season"] = cut(df$date_reformat, breaks=as.POSIXct(breaks), labels=labels)
    
    splits = list()
    
    for (s in c("winter", "pre_monsoon", "monsoon", "post_monsoon")){
      splits[[s]] = subset(df, season == s)[c("date", "value")]
    }
    
    

    Now, the splits list should have all the data you need