Search code examples
rdatecategories

Categorizing data using date variable in R


I am having trouble in using the date variable in my dataset to create categories of 6 months time period. I want to create these time period categories for years between 2017-1-1 and 2020-6-30. The time period categories for each year would be from 2017-1-1 to 2017-6-30, and 2017-7-1 to 2017-12-31 until 2020-6-30. I have used the following two types of codes to create date categories but I am getting a similar error:

#CODE1
#checking for date class
myData <- str(myData)
myData #date in factor class
#convert to date class
date_class <- as.Date(myData$date, format = "%m/%d/%Y")
myData$date_class <- as.Date(myData$date, format = "%m/%d/%Y")
myData
#creating timeperiod category 1
date_cat <- NA
myData$date_cat[which(myData$date_class >= "2017-1-1" & myData$date_class < "2017-7-1")] <- 1

#CODE2
#converting to date format
myData$date <- strptime(myData$date,format="%m/%d/%Y")
myData$date <- as.POSIXct(myData$date)
myData
#creating timeperiod category 1
date_cat <- NA
myData$date_cat[which(myData$date >= "2017-1-1" & myData$date < "2017-7-1")] <- 1

For both the codes I am getting a similar error

Error in $<-.data.frame(*tmp*, date_cat, value = numeric(0)) :
replacement has 0 rows, data has 1123

Please help me with understanding where I am going wrong.

Thanks, Priya


Solution

  • Here's a function (to.interval) that returns a time interval {0, 1, 2, 3, ...}, given parameters of the event date, index date, and interval width. Probably a good idea to include error checking in the function, so if for example the event date is prior to the anchor date, it returns NA.

    df <- data.frame(event.date=as.Date(c("2017-01-01", "2017-08-01", "2018-04-30")))
    
    to.interval <- function(anchor.date, future.date, interval.days){
      round(as.integer(future.date - anchor.date) / interval.days, 0)} 
    
    df$interval <- to.interval(as.Date('2017-01-01'), 
                              df$event.date, 180 )
    
    df
    

    Output

      event.date interval
    1 2017-01-01        0
    2 2017-08-01        1
    3 2018-04-30        3