Search code examples
rstring-formattinglubridate

Parsing/formatting odd date formats with lubridate


I am having some trouble formatting the following date with lubridate. I'm not married to the lubridate approach but can someone recommend a good way to format these wonky Sept dates?

library(lubridate)

df <- data.frame(y=1:5, Date=c("Sept 1 2002","Sept 7 2002","Sept 9 2002","Sept 20 2002","Sept 21 2002"))

I didn't really expect this to work:

df$Date2=mdy(df$Date)

But I do not understand why this one didn't work:

df$Date2=parse_date_time(df$Date, "%b %d %Y")

Any ideas?


Solution

  • It will work if we match the abbreviations as in month.abb. One option would be to remove the 't' in 'Sept' using sub.

     mdy(sub('(...).', '\\1', df$Date))
     #[1] "2002-09-01 UTC" "2002-09-07 UTC" "2002-09-09 UTC" "2002-09-20 UTC" "2002-09-21 UTC"
    

    and

    month.abb
    #[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
    

    If we look at ?strptime

    %b: Abbreviated month name in the current locale on this platform. (Also matches full name on input: in some locales there are no abbreviations of names.)