I have dataframe with a poorly formatted date information.
date = c("18102016", "11102017", "4052017", "18102018", "3102018")
df <- data.frame(date = date, x1 = 1:5, x2 = rep(1,5))
I have already written the function fix_date_all()
which does the proper formatting when applied to the vector df$date
fix_date_all<- function(date){
fix_date <- function(d) {
if (nchar(d) != 8) d <- paste0("0", d)
dd <- d %>% substr(1,2)
mm <- d %>% substr(3,4)
yyyy <- d %>% substr(5,8)
d <- paste0(dd, ".", mm, ".", yyyy) %>% as.Date("%d.%m.%Y")
d
}
lapply(date, fix_date)
}
fix_date_all(df$date)
Now I would like to transform this variable to a proper date format using a tidyverse like style:
df %>% mutate(across(date, fix_date_all))
However, when using it in a tidyverse style, the date gets screwed up.
date x1 x2
1 17092 1 1
2 17450 2 1
3 17290 3 1
4 17822 4 1
5 17807 5 1
A second option would be to get rid of lapply
and rewrite your function using e.g. string::str_pad
:
library(dplyr, warn.conflicts = FALSE)
fix_date_all<- function(date){
date %>%
stringr::str_pad(width = 8, pad = "0") %>%
as.Date(format = "%d%m%Y")
}
fix_date_all(df$date)
#> [1] "2016-10-18" "2017-10-11" "2017-05-04" "2018-10-18" "2018-10-03"
df %>%
mutate(across(date, fix_date_all))
#> date x1 x2
#> 1 2016-10-18 1 1
#> 2 2017-10-11 2 1
#> 3 2017-05-04 3 1
#> 4 2018-10-18 4 1
#> 5 2018-10-03 5 1