Search code examples
rstrptimeas.date

Issue with R ddate format "%Y-%V"


I don't understand the following behaviour:

format(as.Date("2017-01-01"), "%Y-%V")
[1] "2017-52"
> as.Date("2017-01-01")
[1] "2017-01-01"
> format(as.Date("2017-01-01")-1, "%Y-%V")
[1] "2016-52"

I was expecting, as output of first row, to get 2016-52 instead of 2017-52, as the 1st of January is in a week with less than 4 days in 2017.

Any idea on what went wrong?

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server >= 2012 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] plotly_4.7.1       highcharter_0.5.0  flexdashboard_0.5  xts_0.10-0         zoo_1.8-0         
 [6] rmdformats_0.3.3   dygraphs_1.1.1.4   pROC_1.10.0        elasticnet_1.1     lars_1.2          
[11] here_0.1           brew_1.0-6         Hmisc_4.0-3        Formula_1.2-2      stringr_1.2.0     
[16] devtools_1.13.3    RODBC_1.3-15       caret_6.0-77       testthat_1.0.2     rmarkdown_1.6     
[21] gbm_2.1.3          lattice_0.20-35    survival_2.41-3    data.table_1.10.4  rgdal_1.2-11      
[26] sp_1.2-5           colorRamps_2.3     gridExtra_2.3      DT_0.2             knitr_1.17        
[31] reshape2_1.4.2     scales_0.5.0.9000  ggmap_2.6.1        dplyr_0.7.3        purrr_0.2.3       
[36] readr_1.1.1        tidyr_0.7.1        tibble_1.3.4       ggplot2_2.2.1.9000 tidyverse_1.1.1   
[41] pacman_0.4.6      

loaded via a namespace (and not attached):
 [1] readxl_1.0.0        backports_1.1.0     plyr_1.8.4          igraph_1.1.2       
 [5] lazyeval_0.2.1      digest_0.6.12       foreach_1.4.3       htmltools_0.3.6    
 [9] magrittr_1.5        checkmate_1.8.3     memoise_1.1.0       cluster_2.0.6      
[13] recipes_0.1.0       modelr_0.1.1        gower_0.1.2         dimRed_0.1.0       
[17] jpeg_0.1-8          colorspace_1.3-2    rvest_0.3.2         haven_1.1.0        
[21] crayon_1.3.4        jsonlite_1.5        bindr_0.1           iterators_1.0.8    
[25] glue_1.1.1          DRR_0.0.2           gtable_0.2.0        ipred_0.9-6        
[29] questionr_0.6.1     kernlab_0.9-25      ddalpha_1.2.1       quantmod_0.4-11    
[33] DEoptimR_1.0-8      maps_3.2.0          miniUI_0.1.1        Rcpp_0.12.14       
[37] viridisLite_0.2.0   xtable_1.8-2        htmlTable_1.9       foreign_0.8-69     
[41] mapproj_1.2-5       stats4_3.4.1        lava_1.5            prodlim_1.6.1      
[45] htmlwidgets_0.9     httr_1.3.1          RColorBrewer_1.1-2  geosphere_1.5-5    
[49] acepack_1.4.1       pkgconfig_2.0.1     nnet_7.3-12         rlang_0.1.2        
[53] munsell_0.4.3       cellranger_1.1.0    tools_3.4.1         broom_0.4.2        
[57] evaluate_0.10.1     yaml_2.1.14         ModelMetrics_1.1.0  robustbase_0.92-7  
[61] RgoogleMaps_1.4.1   bindrcpp_0.2        nlme_3.1-131        mime_0.5           
[65] RcppRoll_0.2.2      xml2_1.1.1          compiler_3.4.1      rstudioapi_0.7     
[69] curl_2.8.1          png_0.1-7           stringi_1.1.5       highr_0.6          
[73] forcats_0.2.0       Matrix_1.2-10       psych_1.7.8         httpuv_1.3.5       
[77] R6_2.2.2            latticeExtra_0.6-28 bookdown_0.5        codetools_0.2-15   
[81] MASS_7.3-47         assertthat_0.2.0    CVST_0.2-1          proto_1.0.0        
[85] rprojroot_1.2       rjson_0.2.15        withr_2.1.0.9000    mnormt_1.5-5       
[89] rlist_0.4.6.1       hms_0.3             grid_3.4.1          rpart_4.1-11       
[93] timeDate_3012.100   class_7.3-14        TTR_0.23-2          shiny_1.0.5        
[97] lubridate_1.6.0     base64enc_0.1-3  

Solution

  • Nothing wrong with your machine, it's how it operates

    > format(as.Date("2018-01-01"), "%Y-%V")
    [1] "2018-01"
    > format(as.Date("2017-01-01"), "%Y-%V")
    [1] "2017-52"
    

    %V

    If the week (starting on Monday) containing 1 January has four or more days in the new year, then it is considered week 1. Otherwise, it is the last week of the previous year, and the next week is week 1. (Accepted but ignored on input.)Week of the year as decimal number (01–53) as defined in ISO 8601.

    Explanation for resulting in 52th week :

    From Wikipedia ISO 8601

    If 1 January is on a Monday, Tuesday, Wednesday or Thursday, it is in week 01. If 1 January is on a Friday, Saturday or Sunday, it is in week 52 or 53 of the previous year (there is no week 00). 28 December is always in the last week of its year.

    Why same year?

    Because it's Week-based-year as opposed to our Calendar Year.

    More examples from Wikiedpia:

    The ISO week-numbering year starts at the first day (Monday) of week 01 and ends at the Sunday before the new ISO year (hence without overlap or gap). It consists of 52 or 53 full weeks. The first ISO week of a year may have up to three days that are actually in the Gregorian calendar year that is ending; if three, they are Monday, Tuesday and Wednesday. Similarly, the last ISO week of a year may have up to three days that are actually in the Gregorian calendar year that is starting; if three, they are Friday, Saturday, and Sunday. The Thursday of each ISO week is always in the Gregorian calendar year denoted by the ISO week-numbering year.

    Examples:

    Monday 29 December 2008 is written "2009-W01-1"
    Sunday 3 January 2010 is written "2009-W53-7"
    

    tl:dr; This is the behaviour of ISO 8601 and has nothing to do with R. And to overcome this, It's better to use %U

    > format(as.Date("2017-01-01"), "%Y-%U")
    [1] "2017-01"
    > format(as.Date("2016-01-01"), "%Y-%U")
    [1] "2016-00"
    > format(as.Date("2016-01-02"), "%Y-%U")
    [1] "2016-00"
    > format(as.Date("2016-12-02"), "%Y-%U")
    [1] "2016-48"
    > format(as.Date("2016-12-31"), "%Y-%U")
    [1] "2016-52"