Search code examples
rdplyrlubridate

dplyr is turning dates to doubles when mutating dataframe


I'm working with a data frame with dates and times. I need to set certain date columns to NA, but when I use the mutate function, the date columns turn into double columns!

Example:

df
##   my_date_01    my_date_02
##   <date>        <date>    
## 1 2018-09-13    NA        
## 2 NA            2018-12-09
## 3 NA            2018-12-16
## 4 NA            2019-01-19
## 5 NA            2018-11-19
## 6 NA            2018-12-16
## 7 NA            2018-08-08
## 8 NA            2018-10-24
## 9 2019-03-26    NA        
##10 NA            2018-12-11

df %>% mutate(my_date_01 = ifelse(my_date_01 > lubridate::date("2019-01-01"), 
                                  NA, 
                                  my_date_01))
##   my_date_01    my_date_02
##   <dbl>         <date>    
## 1      17787    NA        
## 2         NA    2018-12-09
## 3         NA    2018-12-16
## 4         NA    2019-01-19
## 5         NA    2018-11-19
## 6         NA    2018-12-16
## 7         NA    2018-08-08
## 8         NA    2018-10-24
## 9         NA    NA        
##10         NA    2018-12-11

Two things are happening here:

  1. The desired dates are indeed being replaced with NA (this is Ok), but
  2. The whole column is converting to double (this is very wrong)

What am I missing?


Solution

  • We can wrap as.Date(NA) to get the same type for while doing the comparison

    library(dplyr)
    df %>% 
       mutate(my_date_01 = case_when(my_date_01 > lubridate::date("2019-01-01") 
                                  ~ as.Date(NA), 
                                  TRUE ~my_date_01))
    # my_date_01 my_date_02
    #1  2018-09-13       <NA>
    #2        <NA> 2018-12-09
    #3        <NA> 2018-12-16
    #4        <NA> 2019-01-19
    #5        <NA> 2018-11-19
    #6        <NA> 2018-12-16
    #7        <NA> 2018-08-08
    #8        <NA> 2018-10-24
    #9        <NA>       <NA>
    #10       <NA> 2018-12-11
    

    The fix above may still not work with ifelse because in the source code of ifelse

    ...
     ans <- test # assigned to logical vector test
        len <- length(ans)
        ypos <- which(test)
        npos <- which(!test)
        if (length(ypos) > 0L) 
            ans[ypos] <- rep(yes, length.out = len)[ypos]
        if (length(npos) > 0L) 
            ans[npos] <- rep(no, length.out = len)[npos]
        ans
    }
    

    If instead it is

    ans <- as.Date(rep(NA, length(test)))
    

    it would work with the code below because the logical vector is coerced to numeric while doing the assignment which is not happening with the as.Date step above

    df %>% 
        mutate(my_date_01 = ifelsenew(my_date_01 > lubridate::date("2019-01-01"), 
                                       as.Date(NA), 
                                      my_date_01))
    

    data

    df <- structure(list(my_date_01 = structure(c(17787, NA, NA, NA, NA, 
    NA, NA, NA, 17981, NA), class = "Date"), my_date_02 = structure(c(NA, 
    17874, 17881, 17915, 17854, 17881, 17751, 17828, NA, 17876), class = "Date")), 
     row.names = c("1", 
    "2", "3", "4", "5", "6", "7", "8", "9", "10"), class = "data.frame")