I am purposefully leaving out the day in this code and am expecting it to fail, return a warning, or return an incomplete record.
txt <- "January 2010"
lubridate::mdy(txt)
The output is "2010-01-20". Why does it include '20' when the day is not my input? What is the logic behind that value?
It is related to the order of parsing. According to ?mdy
In case of heterogeneous date formats, the ymd() family guesses formats based on a subset of the input vector. If the input vector contains many missing values or non-date strings, the subset might not contain meaningful dates
The original string includes month followed by 4 digit year and mdy
is month
, day
year
and year
can be either 2 digit or 4 digit. Now, there is a confusion and it selects 2 digit year as '10' and the day are parsed as '20'. Instead, if we add a day and then use mdy
, it would parse as 4 digit year
lubridate::myd(paste(txt, '01'))
#[1] "2010-01-01"