I am working on a vector with 468006 elements and each element represents a date/time in one of two formats
A snippet of the vector is as follows:
> result_date_time_vector[1810:1820]
[1] "2021-01-03 02:22:27" "2021-01-03 02:22:27" "2021-01-03 02:22:27" "2021-01-03 02:22:27" "2021-01-03 02:22:27" "2021-01-03 02:22:27"
[7] "1/3/2021" "2021-01-03 13:12:57" "2021-01-03 13:12:57" "2021-01-03 13:12:57" "2021-01-03 13:12:57"
> class(result_date_time_vector)
[1] "character"
I would like to remove the information about time and then convert the elements to a single consistent format.
I tried a for-loop and the process was very slow (but received no errors or warnings)
> fixed_result_date_time <- rep (NA, length(result_date_time_vector))
> class(fixed_result_date_time) <- "Date"
> for (n in 1:length(result_date_time_vector)){
if (is.na(result_date_time_vector[n])){
next
} else if (str_detect(result_date_time_vector[n], "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}")){
fixed_result_date_time[n] <- as_date(ymd_hms(result_date_time_vector[n], tz = "America/New_York"))
} else {
fixed_result_date_time[n] <- as_date(mdy(result_date_time_vector[n], tz = "America/New_York"))
}
}
I also tried ifelse function and the process was quick (but received a lot of warnings).
> fixed_result_date_time <- ifelse(str_detect(result_date_time_vector, "\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}"),
as_date(ymd_hms(result_date_time_vector, tz = "America/New_York")),
as_date(mdy(result_date_time_vector, tz = "America/New_York")))
Warning: 20963 failed to parse.
Warning: 447043 failed to parse.
> class(fixed_result_date_time) <- "Date"
There were 20963 elements in the m/d/y format and 447043 elements in the y-m-d h:m:s format in the input vector.
Is there a more efficient method to accomplish the same without warnings?
library(lubridate)
date(parse_date_time(vector, orders = c('ymd HMS', 'mdy')))
[1] "2021-01-03" "2021-01-03" "2021-01-03" "2021-01-03" "2021-01-03" "2021-01-03"
[7] "2021-01-03" "2021-01-03" "2021-01-03" "2021-01-03" "2021-01-03"