I have a very messy date column like this in excel:
For example, the row which the row_id=23 in excel is input as dd/mm/yy format, so it should be read as 2023-09-07 when I import this worksheet into R, but now, when I use rio::read to import this dataset into R, R automatically convert it as numeric format as "45116", and when I use this code :
df_dl2 <- df_dl %>%
mutate(across(c(`specimen taken`, `specimen received at NHL`),
function(x) as_datetime(ifelse(grepl("^[0-9\\.]+$", x),
as.numeric(x) |> as_date(origin = "1899-12-30") |> as_datetime(),
as_datetime(x, format = "%d-%m-%Y"))),
.names = "cleaned_{.col}"))
to convert it back to the dd/mm/yy format, it shows as 2023-07-09 instead of the correct one:2023-09-07.
It happens on all those numerical stored date.They are all input in excel as 1/8/23, 1/9/23... they should be all read as dd/mm/yy but excel and r both think they are mm/dd/yy format and convert it as this rule.
I am confused from where I can tell either excel or R that this is not U.S. date format mm/dd/yy. It looks like when the dataset was read by r, it has been converted as numerical format date as mm/dd/yy. How can I get the correct convert in R?
Probably you can find some interesting clues here
> as.Date(rowSums(sapply(fmts, as.Date, x = s), TRUE), origin = "1970-01-01")
[1] "2023-08-14" "2023-09-01" "2023-08-17" "2023-09-07"
or using coalesce
> do.call(dplyr::coalesce, lapply(fmts, as.Date, x = s))
[1] "2023-08-14" "2023-09-01" "2023-08-17" "2023-09-07"
> s <- c("14-08-2023", "1/9/23", "17-08-2023", "7/9/23")
> fmts <- c("%d/%m/%y", "%d-%m-%Y")