I am working on a data frame that has a column that uses the date and time format as "mm/dd/yy HH:MM" for some observations and "yyyy/mm/dd HH:MM:SS" format for other observations, of course, this inconsistency results in errors or NA returns in my code, how can I unify the whole column so my calculations are not interrupted by this inconsistency?
Update step by step:
We have a data frame with for columns each of them have character type columns (you can check this with str(df)
In order to change the format from character to datetime in all four columns we use mutate(across(1:4, ...
What we want is that in each column 1:4 the character type is changed to datetime
this can be done with the function parse_date_time
from lubridate package
Here we use ~
to indicate an anonymous function
the .
indicates column 1-4.
and most important the argument c("ymd_HMS", "mdy_HM")
which gives the order of the different formats of the date columns!
We could use parse_date_time()
from lubridate
package. Important is the argument c("ymd_HMS", "mdy_HM")
. Here you define the occurence of the different formats:
and note to use HMS
, because:
hms, hm and ms usage is defunct, please use HMS, HM or MS instead. Deprecated in version '1.5.6'.
library(dplyr)
library(lubridate)
df %>%
mutate(across(1:4, ~parse_date_time(., c("ymd_HMS", "mdy_HM"))))
started_at ended_at started_at_1 ended_at_1
<dttm> <dttm> <dttm> <dttm>
1 2021-10-29 17:42:36 2021-10-29 18:00:23 2021-06-13 11:40:00 2021-06-13 12:02:00
2 2021-10-01 15:06:10 2021-10-01 15:09:23 2021-06-27 16:26:00 2021-06-27 16:39:00
3 2021-10-28 23:02:53 2021-10-28 23:07:11 2021-06-10 20:06:00 2021-06-10 20:28:00
4 2021-10-17 00:58:17 2021-10-17 01:02:08 2021-06-11 15:54:00 2021-06-11 16:11:00
5 2021-10-27 18:29:34 2021-10-27 18:34:48 2021-06-05 14:09:00 2021-06-05 14:42:00
6 2021-10-17 13:30:21 2021-10-17 13:35:26 2021-06-05 14:14:00 2021-06-05 14:37:00
7 2021-10-04 19:59:28 2021-10-04 21:06:24 2021-06-16 19:05:00 2021-06-16 19:16:00
8 2021-10-10 00:27:09 2021-10-10 00:39:58 2021-06-23 20:29:00 2021-06-23 20:43:00
data:
structure(list(started_at = c("2021-10-29 17:42:36", "2021-10-01 15:06:10",
"2021-10-28 23:02:53", "2021-10-17 00:58:17", "2021-10-27 18:29:34",
"2021-10-17 13:30:21", "2021-10-04 19:59:28", "2021-10-10 00:27:09"
), ended_at = c("2021-10-29 18:00:23", "2021-10-01 15:09:23",
"2021-10-28 23:07:11", "2021-10-17 01:02:08", "2021-10-27 18:34:48",
"2021-10-17 13:35:26", "2021-10-04 21:06:24", "2021-10-10 00:39:58"
), started_at_1 = c("6/13/21 11:40", "6/27/21 16:26", "6/10/21 20:06",
"6/11/21 15:54", "6/5/21 14:09", "6/5/21 14:14", "6/16/21 19:05",
"6/23/21 20:29"), ended_at_1 = c("6/13/21 12:02", "6/27/21 16:39",
"6/10/21 20:28", "6/11/21 16:11", "6/5/21 14:42", "6/5/21 14:37",
"6/16/21 19:16", "6/23/21 20:43")), class = "data.frame", row.names = c(NA,
-8L))