Search code examples
rdatemutate

How to maintain date format when after applying function


I have dataframe with a poorly formatted date information.

date = c("18102016", "11102017", "4052017", "18102018", "3102018")
df <- data.frame(date = date, x1 = 1:5, x2 = rep(1,5)) 

I have already written the function fix_date_all() which does the proper formatting when applied to the vector df$date

fix_date_all<- function(date){
  fix_date <- function(d) {
    if (nchar(d) != 8) d <- paste0("0", d)
    
    dd <- d %>% substr(1,2)
    mm <- d %>% substr(3,4)
    yyyy <- d %>% substr(5,8)
    
    d <- paste0(dd, ".", mm, ".", yyyy) %>% as.Date("%d.%m.%Y")
    
    d
  }
  
  lapply(date, fix_date)
}

fix_date_all(df$date)

Now I would like to transform this variable to a proper date format using a tidyverse like style:

df %>% mutate(across(date, fix_date_all))

However, when using it in a tidyverse style, the date gets screwed up.

   date x1 x2
1 17092  1  1
2 17450  2  1
3 17290  3  1
4 17822  4  1
5 17807  5  1

Solution

  • A second option would be to get rid of lapply and rewrite your function using e.g. string::str_pad:

    library(dplyr, warn.conflicts = FALSE)
    
    fix_date_all<- function(date){
      date %>%  
        stringr::str_pad(width = 8, pad = "0") %>% 
        as.Date(format = "%d%m%Y")
    }
    
    fix_date_all(df$date)
    #> [1] "2016-10-18" "2017-10-11" "2017-05-04" "2018-10-18" "2018-10-03"
    
    df %>% 
      mutate(across(date, fix_date_all))
    #>         date x1 x2
    #> 1 2016-10-18  1  1
    #> 2 2017-10-11  2  1
    #> 3 2017-05-04  3  1
    #> 4 2018-10-18  4  1
    #> 5 2018-10-03  5  1