Search code examples
rdataframedplyrtidyverseposixct

Filtering "POSIXct" "POSIXt" column based on value and NA in R


I have a dataframe that looks more or less like this:

    tail(df)
    # A tibble: 6 x 3
      GEOGCD    OPER_DATE           TERM_DATE          
      <chr>     <dttm>              <dttm>             
    1 E05006867 2009-01-01 00:00:00 2019-03-31 00:00:00
    2 E05006868 2009-01-01 00:00:00 2019-03-31 00:00:00
    3 E05000066 2009-01-01 00:00:00 2018-05-02 00:00:00
    4 E05000067 2009-01-01 00:00:00 2018-05-02 00:00:00
    5 E05000068 2009-01-01 00:00:00 2018-05-02 00:00:00
    6 E05000064 2018-05-01 22:00:00 NA          
    
    str(df)
    tibble [52 × 3] (S3: tbl_df/tbl/data.frame)
     $ GEOGCD   : chr [1:52] "E05000064" "E05000065" "E05000066" "E05000067" ...
     $ OPER_DATE: POSIXct[1:52], format: "2009-01-01 00:00:00" "2009-01-01 00:00:00" "2009-01-01 00:00:00" ...
     $ TERM_DATE: POSIXct[1:52], format: "2018-05-02" "2018-05-02" "2018-05-02" ...

What I want to do is to select only those who have a TERM_DATE above 2018-12-31 OR are NA. Basically something like this:

3 E05000066 2009-01-01 00:00:00 2018-05-02 00:00:00
4 E05000067 2009-01-01 00:00:00 2018-05-02 00:00:00
5 E05000068 2009-01-01 00:00:00 2018-05-02 00:00:00
6 E05000064 2018-05-01 22:00:00 NA   

I've tried different things, like this:

library(lubridate)
library(dplyr)    
df%>%
filter(TERM_DATE> as.Date("2018-12-31"| is.na(TERM_DATE)))

But I keep on getting errors like the following:

Error: Problem with filter() input ..1.
x operations are possible only for numeric, logical or complex types
ℹ Input ..1 is TERM_DATE > as.Date("2018-12-31" | is.na(TERM_DATE)).

Can any of you understand why that might be and what I should do instead?

Thanks!


Solution

  • Try this approach:

    library(dplyr)
    #Code
    newdf <- df%>%
      filter(TERM_DATE> as.POSIXct("2018-12-31") | is.na(TERM_DATE))
    

    Output:

         GEOGCD           OPER_DATE  TERM_DATE
    1 E05006867 2009-01-01 00:00:00 2019-03-31
    2 E05006868 2009-01-01 00:00:00 2019-03-31
    3 E05000064 2018-05-01 22:00:00       <NA>
    

    The smart solution from @StupidWolf also works:

    #Code 2
    df%>%
      filter(TERM_DATE> as.Date("2018-12-31") | is.na(TERM_DATE))
    

    Output:

         GEOGCD           OPER_DATE  TERM_DATE
    1 E05006867 2009-01-01 00:00:00 2019-03-31
    2 E05006868 2009-01-01 00:00:00 2019-03-31
    3 E05000064 2018-05-01 22:00:00       <NA>
    

    The output expected from OP can be reached using:

    #Code 3
    newdf <- df%>%
      filter(TERM_DATE< as.POSIXct("2018-12-31") | is.na(TERM_DATE))
    

    Output:

         GEOGCD           OPER_DATE  TERM_DATE
    1 E05000066 2009-01-01 00:00:00 2018-05-02
    2 E05000067 2009-01-01 00:00:00 2018-05-02
    3 E05000068 2009-01-01 00:00:00 2018-05-02
    4 E05000064 2018-05-01 22:00:00       <NA>
    

    Or using as.Date(). You need to change the comparison to <.