Search code examples
rdatetime-formatlubridatezooopenxlsx

Parsing one column with 5 didgits integers as date format while reading excel file


For an excel file (download from here):

enter image description here

df <- openxlsx::read.xlsx('sample_data.xlsx', sheet='Sheet1', colNames=TRUE)
df

Output:

   date  value
1 43861   5.70
2 43890 -13.89
3 43921 -49.68
4 43951 -62.81

I try to convert date column to a normal date format:

> df %>% 
+   mutate(date=as.Date(date, origin = "1970-01-01"))
        date  value
1 2090-02-01   5.70
2 2090-03-02 -13.89
3 2090-04-02 -49.68
4 2090-05-02 -62.81

> df %>% 
+   mutate(date=as.Date(date, origin = "1910-01-01"))
        date  value
1 2030-02-01   5.70
2 2030-03-02 -13.89
3 2030-04-02 -49.68
4 2030-05-02 -62.81

I tested with 1970-01-01 and 1910-01-01 as value for origin parameter, the dates in the output seems incorrect (43861 has been convert to 2090-02-01 and 2030-02-01, which should be 2020-01-31).


Solution

  • origin has to be inside the as.Date call.

    df %>%
      mutate(date = as.Date(date, origin = "1899-12-30"))
    #>         date  value
    #> 1 2020-01-31   5.70
    #> 2 2020-02-29 -13.89
    #> 3 2020-03-31 -49.68
    #> 4 2020-04-30 -62.81