Search code examples
rdatedayofweek

53rd week of the year in R?


I have week-date data in the form yyyy-ww where wwis the week number in two digits. The data span 2007-01 to 2010-30. The week counting convention is ISO 8601, which as you can see here on Wikipedia's "Week number" article, occasionally reaches 53 weeks in a year. For example 2009 had 53 weeks by this system, see the week numbers in this ISO 8601 calendar. (See other years; as per the Wikipedia article, 53rd weeks are fairly rare.)

Basically I want to read the week date in, convert it to a Date object and save this to a separate column in a data.frame. As a test, I reconverted the Date objects to yyyy-ww formats by format([Date-object], format = "%Y-%W", and this threw up an error at 2009-53. That week fails to be interpreted as a date by R. This is very odd, as other years which do not have a 53rd week (in ISO 8601 standard) are converted fine, such as 2007-53, whereas other years that also do not have a 53rd week (in ISO 8601 standard) also fail, such as 2008-53

The following minimal example demonstrates the issue.

Minimal example:

dates <- c("2009-50", "2009-51", "2009-52", "2009-53", "2010-01", "2010-02")
as.Date(x = paste(dates, 1), format = "%Y-%W %w")
# [1] "2009-12-14" "2009-12-21" "2009-12-28" NA           "2010-01-04"
# [6] "2010-01-11"

other.dates <- c("2007-53", "2008-53", "2009-53", "2010-53")
as.Date(x = paste(other.dates, 1), format = "%Y-%W %w")
# [1] "2007-12-31" NA           NA           NA     

The question is, how do I get R to accept week numbers in ISO 8601 format?

Note: This question summarises a problem I have been struggling with for a few hours. I have searched and found various helpful posts such as this, but none solved the problem.


Solution

  • The package ISOweek manages ISO 8601 style week numberings, converting to and from Date objects in R. See ISOweek for more. Continuing the example dates above, we first need to modify the formatting a bit. They must be in form yyyy-Www-w rather than yyyy-ww, i.e. 2009-W53-1. The final digit identifies which day of the week to use in identifying the week, in this case it is the Monday. The week number must be two-digit.

    library(ISOweek)
    
    dates <- c("2009-50", "2009-51", "2009-52", "2009-53", "2010-01", "2010-02")
    other.dates <- c("2007-53", "2008-53", "2009-53", "2010-53")
    
    dates <- sub("(\\d{4}-)(\\d{2})", "\\1W\\2-1", dates)
    other.dates <- sub("(\\d{4}-)(\\d{2})", "\\1W\\2-1", other.dates)
    
    ## Check:
    dates
    # [1] "2009-W50-1" "2009-W51-1" "2009-W52-1" "2009-W53-1" "2010-W01-1"
    # [6] "2010-W02-1"
    
    (iso.date <- ISOweek2date(dates))             # deal correctly
    # [1] "2009-12-07" "2009-12-14" "2009-12-21" "2009-12-28" "2010-01-04"
    # [6] "2010-01-11"
    (iso.other.date <- ISOweek2date(other.dates)) # also deals with this
    # [1] "2007-12-31" "2008-12-29" "2009-12-28" "2011-01-03"
    
    ## Check that back-conversion works:
    all(date2ISOweek(iso.date) == dates)
    # [1] TRUE
    
    ## This does not work for the others, since the 53rd week of
    ## e.g. 2008 is back-converted to the first week of 2009, in
    ## line with the ISO 6801 standard.
    date2ISOweek(iso.other.date) == other.dates
    # [1] FALSE FALSE  TRUE FALSE