Search code examples
rdateposixct

hablar::dte() Issue in converting a datetime of class POSIXct to a date


In R 4.2.3, I found that applying dte() to date times of a class "POSIXct" makes the day one less. The following issue is copied from https://github.com/davidsjoberg/hablar/issues/17; please see the link for more information.

Thank you for allowing a package that quickly allows me to change classes of variables. I found that applying dte() to date times of a class "POSIXct" makes the day be one less. Please see the example below.

library(magrittr)

A <- read_excel(   readxl_example("deaths.xlsx"),   range =
"arts!A5:F15",   .name_repair = "universal" )
#> New names:
#> • `Has kids` -> `Has.kids`
#> • `Date of birth` -> `Date.of.birth`
#> • `Date of death` -> `Date.of.death` class(A$Date.of.birth)
#> [1] "POSIXct" "POSIXt" A
#> # A tibble: 10 × 6
#>    Name            Profe…¹   Age Has.k…² Date.of.birth       Date.of.death      
#>    <chr>           <chr>   <dbl> <lgl>   <dttm>              <dttm>             
#>  1 David Bowie     musici…    69 TRUE    1947-01-08 00:00:00 2016-01-10 00:00:00
#>  2 Carrie Fisher   actor      60 TRUE    1956-10-21 00:00:00 2016-12-27 00:00:00
#>  3 Chuck Berry     musici…    90 TRUE    1926-10-18 00:00:00 2017-03-18 00:00:00
#>  4 Bill Paxton     actor      61 TRUE    1955-05-17 00:00:00 2017-02-25 00:00:00
#>  5 Prince          musici…    57 TRUE    1958-06-07 00:00:00 2016-04-21 00:00:00
#>  6 Alan Rickman    actor      69 FALSE   1946-02-21 00:00:00 2016-01-14 00:00:00
#>  7 Florence Hende… actor      82 TRUE    1934-02-14 00:00:00 2016-11-24 00:00:00
#>  8 Harper Lee      author     89 FALSE   1926-04-28 00:00:00 2016-02-19 00:00:00
#>  9 Zsa Zsa Gábor   actor      99 TRUE    1917-02-06 00:00:00 2016-12-18 00:00:00
#> 10 George Michael  musici…    53 FALSE   1963-06-25 00:00:00 2016-12-25 00:00:00
#> # … with abbreviated variable names ¹​Profession, ²​Has.kids A %>% hablar::convert(dte(starts_with("Date")))
#> # A tibble: 10 × 6
#>    Name               Profession   Age Has.kids Date.of.birth Date.of.death
#>    <chr>              <chr>      <dbl> <lgl>    <date>        <date>       
#>  1 David Bowie        musician      69 TRUE     1947-01-07    2016-01-09   
#>  2 Carrie Fisher      actor         60 TRUE     1956-10-20    2016-12-26   
#>  3 Chuck Berry        musician      90 TRUE     1926-10-17    2017-03-17   
#>  4 Bill Paxton        actor         61 TRUE     1955-05-16    2017-02-24   
#>  5 Prince             musician      57 TRUE     1958-06-06    2016-04-20   
#>  6 Alan Rickman       actor         69 FALSE    1946-02-20    2016-01-13   
#>  7 Florence Henderson actor         82 TRUE     1934-02-13    2016-11-23   
#>  8 Harper Lee         author        89 FALSE    1926-04-27    2016-02-18   
#>  9 Zsa Zsa Gábor      actor         99 TRUE     1917-02-05    2016-12-17   
#> 10 George Michael     musician      53 FALSE    1963-06-24    2016-12-24 Created on 2023-03-27 with [reprex
v2.0.2](https://reprex.tidyverse.org/) ```

For example, instead of the date of birth for Bowie being
`1947-01-08", the day becomes "1947-01-07".  The same is true for all
dates of these musicians. 

I know that package **readxl** read the data right as this is the
excel sheet that the data came from. The dates match identically
between the excel and the resulting R tibble.  <img width="502"
alt="Annotation 2023-03-27 093609"
src="https://user-images.githubusercontent.com/17706062/227954203-ae72f540-730a-4f1d-b472-57cdce131c16.png">

The packages and versions used: ``` R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10
x64 (build 19045), RStudio 2023.3.0.386

Locale:   LC_COLLATE=English_United States.utf8 
LC_CTYPE=English_United States.utf8      LC_MONETARY=English_United
States.utf8 LC_NUMERIC=C                            
LC_TIME=English_United States.utf8    

Package version:   base64enc_0.1.3   bslib_0.4.2       cachem_1.0.7   
callr_3.7.3       cellranger_1.1.0    cli_3.6.0         clipr_0.8.0   
compiler_4.2.3    cpp11_0.4.3       crayon_1.5.2        digest_0.6.31 
dplyr_1.1.0       ellipsis_0.3.2    evaluate_0.20     fansi_1.0.4     
fastmap_1.1.1     fs_1.6.1          generics_0.1.3    glue_1.6.2      
graphics_4.2.3      grDevices_4.2.3   hablar_0.3.2      highr_0.10    
hms_1.1.2         htmltools_0.5.4     jquerylib_0.1.4   jsonlite_1.8.4
knitr_1.42        lifecycle_1.0.3   lubridate_1.9.2     magrittr_2.0.3
memoise_2.0.1     methods_4.2.3     mime_0.12         pillar_1.8.1    
pkgconfig_2.0.3   prettyunits_1.1.1 processx_3.8.0    progress_1.2.2  
ps_1.7.3            purrr_1.0.1       R6_2.5.1          rappdirs_0.3.3
readxl_1.4.2      rematch_1.0.1       reprex_2.0.2      rlang_1.1.0   
rmarkdown_2.20    rstudioapi_0.14   sass_0.4.5          stats_4.2.3   
stringi_1.7.12    stringr_1.5.0     tibble_3.2.1      tidyselect_1.2.0
timechange_0.2.0  tinytex_0.44      tools_4.2.3       utf8_1.2.3      
utils_4.2.3         vctrs_0.6.0       withr_2.5.0       xfun_0.37     
yaml_2.3.7       ```

Solution

  • Please see https://github.com/davidsjoberg/hablar/issues/17 for a potential answer. The contents are shown below in case the linked page becomes invalidated:

    For some reason, strftime removes a day for dates that have times at midnight. According to the documentation, changes have been made in R versions 4.2.0 and following:

    strftime is a wrapper for format.POSIXlt, and it and format.POSIXct first convert to class "POSIXlt" by calling as.POSIXlt (so they also work for class "Date"). Note that only that conversion depends on the time zone. Since R version 4.2.0, that as.POSIXlt() conversion now treats the non-finite numeric -Inf, Inf, NA and NaN differently (where previously all were treated as NA) and also the format() method for POSIXlt now treats these different non-finite times and dates analogously to type double.

    Using as.Date() for variables belonging to the POSIXct class solves the problem, so the checking of the POSIXct class is not needed. I don't have the writing permissions to pull a request.

        if (any(class(.x) == "Date")) {
          return(.x)
        }
        if (is.logical(.x)) {
          stop("Logical vectors can't be converted to date.")
        }
        if (is.factor(.x)) {
          .x <- as.character(.x)
        }
        # if (any(class(.x) == "POSIXct")) {
        #   .x <- strftime(.x)
        # }
        if (TRUE) {
          return(as.Date(.x, ...))
        }   } ```
    
    Note to other users: The function `as_reliable_dte()` is an internal
    function that is called by `dte()`.  ```{r} dte <- function (...,
    .args = list()) {   list(vars = dplyr::quos(...), fun =
    ~as_reliable_dte(., !!!.args)) }
    
    
    A <- read_excel(   readxl_example("deaths.xlsx"),   range =
    "arts!A5:F15",   .name_repair = "universal" )  A %>%
    hablar::convert(dte(starts_with("Date"))) ``` ```
    # A tibble: 10 × 6    Name               Profession   Age Has.kids Date.of.birth Date.of.death    <chr>              <chr>      <dbl>
    <lgl>    <date>        <date>         1 David Bowie        musician   
    69 TRUE     1947-01-08    2016-01-10     2 Carrie Fisher      actor   
    60 TRUE     1956-10-21    2016-12-27     3 Chuck Berry        musician
    90 TRUE     1926-10-18    2017-03-18     4 Bill Paxton        actor   
    61 TRUE     1955-05-17    2017-02-25     5 Prince             musician
    57 TRUE     1958-06-07    2016-04-21     6 Alan Rickman       actor   
    69 FALSE    1946-02-21    2016-01-14     7 Florence Henderson actor   
    82 TRUE     1934-02-14    2016-11-24     8 Harper Lee         author  
    89 FALSE    1926-04-28    2016-02-19     9 Zsa Zsa Gábor      actor   
    99 TRUE     1917-02-06    2016-12-18    10 George Michael     musician
    53 FALSE    1963-06-25    2016-12-25    ```