Search code examples
rdatereadxl

R date parsing using read_excel function


When using the read_excel function the dates in the spreadsheet, in the column FuelEventDateTime, are in format "dd/mm/yyyy hr:mm:ss" (example: 03/05/2019 9:19:00 AM) is parsed as a character string with a format like this: example: 43588.849xxxxx (being x any number). I cannot set this column to the correct date class, and I don't know what that number can mean but have seen it several times in Excel.

Tried to separate the "." in the character string, set the column as.numeric, and tried several functions in lubridate, R base and anydate library, as maybe that number is a date in epoch format in origin "1900-01-01"

Read data

sys_raw <- read_excel("Advanced Fill-Ups Report 15052019_165240.xlsx", sheet = "Data", col_names = FALSE) 

col_names_sys <- sys_raw[11,] 

sys_tidy <- sys_raw[12:ncol(sys_raw),] %>% 
  setNames(col_names_sys) %>% 
  select(DeviceName, FuelEventDateTime,FuelUsedEventDistance)

Noticed the character string as numbers, tried separate "." and set as numeric

sys_tidy <- sys_tidy %>% 

  mutate(FuelEventDateTime = str_split(FuelEventDateTime, "\\.")) %>% 

  separate(FuelEventDateTime, c("c","date","time")) %>% 

  separate(DeviceName, c("Device"), sep = "\\s") %>% 

  select(Device, date, FuelUsedEventDistance) %>% 

  mutate(date = as.numeric(date)) 

sys_tidy <- sys_tidy %>% 

  as.Date(date, origin = "1900-01-01") 

Actual results of this are errors, the expected result is a column date with a date class in the format "dd/mm/yyyy", don't need time.

Example of error messages:

Error in as.Date.default(., date, origin = "1900-01-01") :do not know how to convert '.' to class “Date”

Error in as.POSIXct.default(., date, origin = "1900-01-01") :do not know how to convert '.' to class “POSIXct”


Solution

  • sys_tidy <- sys_tidy %>% 
       as.Date(date, origin = "1900-01-01") 
    

    is equivalent to

    sys_tidy <- as.Date(sys_tidy, date, origin = "1900-01-01") 
    

    You probably mean

    sys_tidy <- sys_tidy %>% 
       mutate(date = as.Date(date, origin = "1900-01-01"))
    

    Otherwise you are plugging a data frame into the first term of as.Date and R doesn't know what to do with that. From ?as.Date: The as.Date methods accept character strings, factors, logical NA and objects of classes "POSIXlt" and "POSIXct".

    mutate, from dplyr, understands that you will be working with one or more columns within the data frame (sys_tidy) that was fed into it with the %>% pipe, and assigns the output to the column called date therein.

    The base R equivalent would be similar, but would require that the input and the output both specify the context for the date column, which lives within the sys_tidy data frame.

    sys_tidy$date = as.Date(sys_tidy$date, origin = "1900-01-01"))