Search code examples
rtidyverselubridate

Why does lubridate::mdy() add day when it is missing from my input?


I am purposefully leaving out the day in this code and am expecting it to fail, return a warning, or return an incomplete record.

txt <-  "January 2010"
lubridate::mdy(txt)

The output is "2010-01-20". Why does it include '20' when the day is not my input? What is the logic behind that value?


Solution

  • It is related to the order of parsing. According to ?mdy

    In case of heterogeneous date formats, the ymd() family guesses formats based on a subset of the input vector. If the input vector contains many missing values or non-date strings, the subset might not contain meaningful dates

    The original string includes month followed by 4 digit year and mdy is month, day year and year can be either 2 digit or 4 digit. Now, there is a confusion and it selects 2 digit year as '10' and the day are parsed as '20'. Instead, if we add a day and then use mdy, it would parse as 4 digit year

    lubridate::myd(paste(txt, '01'))
    #[1] "2010-01-01"