Search code examples
rdatetimedate-conversionlubridate

Date conversion without specifying the format


I do not understand how the "ymd" function from the library "lubridate" works in R. I am trying to build a feature which converts the date correctly without having to specify the format. I am checking for the minimum number of NA's occurring as a result of dmy(), mdy() and ymd() functions.

So ymd() is giving NA sometimes and sometimes not for the same Date value. Are there any other functions or packages in R, which will help me get over this problem.

> data$DTTM[1:5]
[1] "4-Sep-06"  "27-Oct-06" "8-Jan-07"  "28-Jan-07" "5-Jan-07" 

> ymd(data$DTTM[1])
[1] NA
Warning message:
All formats failed to parse. No formats found. 
> ymd(data$DTTM[2])
[1] "2027-10-06 UTC"
> ymd(data$DTTM[3])
[1] NA
Warning message:
All formats failed to parse. No formats found. 
> ymd(data$DTTM[4])
[1] "2028-01-07 UTC"
> ymd(data$DTTM[5])
[1] NA
Warning message:
All formats failed to parse. No formats found. 
> 

> ymd(data$DTTM[1:5])
[1] "2004-09-06 UTC" "2027-10-06 UTC" "2008-01-07 UTC" "2028-01-07 UTC"
[5] "2005-01-07 UTC"

Thanks


Solution

  • @user1317221_G has already pointed out that you dates are in day-month-year format, which suggests that you should use dmy instead of ymd. Furthermore, because your month is in %b format ("Abbreviated month name in the current locale"; see ?strptime), your problem may have something to do with your locale. The month names you have seem to be English, which may differ from how they are spelled in the locale you are currently using.

    Let's see what happens when I try dmy on the dates in my locale:

    date_english <- c("4-Sep-06",  "27-Oct-06", "8-Jan-07",  "28-Jan-07", "5-Jan-07")
    dmy(date_english)
    
    # [1] "2006-09-04 UTC" NA               "2007-01-08 UTC" "2007-01-28 UTC" "2007-01-05 UTC"
    # Warning message:
    #  1 failed to parse.
    

    "27-Oct-06" failed to parse. Let's check my time locale:

    Sys.getlocale("LC_TIME")
    # [1] "Norwegian (Bokmål)_Norway.1252"
    

    dmy does not recognize "oct" as a valid %b month in my locale.

    One way to deal with this issue would be to change "oct" to the corresponding Norwegian abbreviation, "okt":

    date_nor <- c("4-Sep-06",  "27-Okt-06", "8-Jan-07",  "28-Jan-07", "5-Jan-07" )
    dmy(date_nor)
    # [1] "2006-09-04 UTC" "2006-10-27 UTC" "2007-01-08 UTC" "2007-01-28 UTC" "2007-01-05 UTC"
    

    Another possibility is to use the original dates (i.e. in their original 'locale'), and set the locale argument in dmy. Exactly how this is done is platform dependent (see ?locales. Here is how I would do it in Windows:

    dmy(date_english, locale = "English")
    [1] "2006-09-04 UTC" "2006-10-27 UTC" "2007-01-08 UTC" "2007-01-28 UTC" "2007-01-05 UTC"