Search code examples
rdplyrbooleanposixctas.date

cannot get subset of date from df based on date columns


I am making a report about a data frame containing transactions for parking. The data frame has two POSIXct variables called start_time and end_time. An example of how they look is this:

2019-11-25 10:35:00 UTC" "2019-11-28 18:21:00 UTC"

I want to get a subset of my data frame from "2020-02-06" till "2020-03-23". There are definitely transactions that occurred during this time but when I use the following code I get a subset with zero observations.

cutdate<- as.Date("2020-03-23",format ="%Y-%m-%d")
bdate<-as.Date("2020-06-02",format ="%Y-%m-%d")

dwdf%>% filter(as.Date(start_time) >= bdate & as.Date(end_time) < cutdate)->Bdf

I have tried other functions like subset and so forth but I get the same thing. I then tried something simpler and tried to see if from the variable start_time if I created a simple boolean expression like

any(as.Date(dwdf$start_time) > bdate)

If I would get any TRUE's but all I get is false. I do not understand why this is the case because if I do range(start_time) There are clearly start times that are later than february 6th.

I have also looked at similar questions on stack exchange and I don't know why I am getting different results since I am implementing what they said.


Solution

  • I would strongly recommend the lubridate package for dealing with time data.

    library(lubridate)
    
    x <- c("2019-11-25 10:35:00 UTC", "2019-11-28 18:21:00 UTC")
    
    x_datetime <- ymd_hms(x)
    
    x_datetime < ymd(20191126)
    [1]  TRUE FALSE
    

    These functions (ymd as in "year-month-day" or ymd_hms as in year-month-day-hour-minute-second") make it easy to convert the data into date formats. And once you have this, comparisons are not that hard.