I am making a report about a data frame containing transactions for parking. The data frame has two POSIXct variables called start_time and end_time. An example of how they look is this:
2019-11-25 10:35:00 UTC" "2019-11-28 18:21:00 UTC"
I want to get a subset of my data frame from "2020-02-06"
till "2020-03-23"
.
There are definitely transactions that occurred during this time but when I use the following code I get a subset with zero observations.
cutdate<- as.Date("2020-03-23",format ="%Y-%m-%d")
bdate<-as.Date("2020-06-02",format ="%Y-%m-%d")
dwdf%>% filter(as.Date(start_time) >= bdate & as.Date(end_time) < cutdate)->Bdf
I have tried other functions like subset and so forth but I get the same thing. I then tried something simpler and tried to see if from the variable start_time if I created a simple boolean expression like
any(as.Date(dwdf$start_time) > bdate)
If I would get any TRUE's but all I get is false. I do not understand why this is the case because if I do range(start_time)
There are clearly start times that are later than february 6th.
I have also looked at similar questions on stack exchange and I don't know why I am getting different results since I am implementing what they said.
I would strongly recommend the lubridate
package for dealing with time data.
library(lubridate)
x <- c("2019-11-25 10:35:00 UTC", "2019-11-28 18:21:00 UTC")
x_datetime <- ymd_hms(x)
x_datetime < ymd(20191126)
[1] TRUE FALSE
These functions (ymd
as in "year-month-day" or ymd_hms
as in year-month-day-hour-minute-second") make it easy to convert the data into date formats. And once you have this, comparisons are not that hard.