Mock data:
my_list <- list(structure(list(start = c("2023-09-12T18:59:49.336+01:00",
"2023-09-18 07:29:34", "z2023-09-18 07:29:34"), country = c("USA",
"UK", "UK")), class = "data.frame", row.names = c(NA, -3L)),
structure(list(start = c("2023-09-16T21:12:51.288Z", "2019-11-30",
"2019-30-11"), country = c("UK", "Canada", "France")), class = "data.frame", row.names = c(NA,
-3L)))
Using the second answer to this question (please note, the first answer does not work for me), I want to convert all dates to year-month-day.
I do:
library(anytime)
lapply(my_list, transform,
start = anydate(start)
)
Output:
[[1]]
start country
1 2023-09-12 USA
2 2023-09-18 UK
3 <NA> UK
[[2]]
start country
1 2023-09-16 UK
2 2019-11-30 Canada
3 <NA> France
As one can see, it does not work for two dates, which are coerced to NA. Note, my real data has dozens of dfs and thousands of dates, with no unique format.
How do we do this? Desired output:
[[1]]
start country
1 2023-09-12 USA
2 2023-09-18 UK
3 2023-09-18 UK
[[2]]
start country
1 2023-09-16 UK
2 2019-11-30 Canada
3 2019-11-30 France
UPDATE
transform
, and to be able to do any modification on other variables (i.e., not only dates, see example below) if neededanytime
, though(Example to illustrate Update point 3:)
lapply(my_list, transform,
start = anydate(start),
country = as.factor(country) # etc.
)
Further, final format need to be a Date
and Factor
, respectively:
List of 2
$ :'data.frame': 3 obs. of 2 variables:
..$ start : Date[1:3], format: "2023-09-12" "2023-09-18" NA
..$ country: Factor w/ 2 levels "UK","USA": 2 1 1
$ :'data.frame': 3 obs. of 2 variables:
..$ start : Date[1:3], format: "2023-09-16" "2019-11-30" NA
..$ country: Factor w/ 3 levels "Canada","France",..: 3 1 2
An approach with parse_date_time
. Using sub
to pull out the date part of the string.
EDIT: with updated data
library(lubridate)
lapply(my_list, \(x)
setNames(data.frame(
parse_date_time(
sub(".*(\\d{4}-\\d+-\\d+).*", "\\1", x[,"start"]),
orders=c("Ymd", "Ydm")),
x[,-1]), # everything but the first "start" column
names(x)))
[[1]]
start country
1 2023-09-12 USA
2 2023-09-18 UK
3 2023-09-18 UK
[[2]]
start country
1 2023-09-16 UK
2 2019-11-30 Canada
3 2019-11-30 France
Keep in mind that ambivalent dates like 2023-08-09
<> 2023-09-08
are interpreted through the first rule Ymd
Using transform
lapply(my_list, \(x)
transform(x,
start=parse_date_time(sub(".*(\\d{4}-\\d+-\\d+).*", "\\1", start),
orders=c("Ymd", "Ydm")),
country=as.factor(country)))