Search code examples
rdatedplyrlubridate

Search and mass convert character columns to date in R with dplyr without explicite specification


I have a messy dataframe with thousand variables and want to automate conversion of specific columns to dates without having to specify which columns explicitely. All columns to convert have "Date" in their name. Most are mdy but they also can be dmy. Some contain errors, or malformatted dates but in a very very minor proportion <0.1%.

I tried:

df %>% select(contains("Date")) %>% as_Date() #Does not work
df %>%  select(contains("Date"))  %>% mdy() #selecting only the columns with dates, does not work
df %>% select(contains("Date")) %>% parse_date_time( c("mdy", "dmy")) #also does not work

I think I dont get something fundamental.


Solution

  • Here's a solution based on lubridate:

    Toy data:

    df <- data.frame(Date1 = c("01-Mar-2015", "31-01-2012", "15/01/1999"), 
                     Var_Date = c("01-02-2018", "01/08/2016", "17-09-2007"), 
                     More_Dates = c("27/11/2009", "22-Jan-2013", "20-Nov-1987"))
    
    # define formats:
    formats <- c("%d-%m-%Y", "%d/%m/%Y", "%d-%b-%Y")
    

    A dyplrsolution:

    library(dplyr)
    library(lubridate)
    df %>% 
      mutate(across(contains("Date"), 
                    ~ parse_date_time(., orders = formats))) %>%
      mutate(across(contains("Date"),
                    ~ format(., "%d-%m-%Y")))
           Date1   Var_Date More_Dates
    1 01-03-2015 01-02-2018 27-11-2009
    2 31-01-2012 01-08-2016 22-01-2013
    3 15-01-1999 17-09-2007 20-11-1987
    

    A base Rsolution:

    library(lubridate)
    df[,grepl("Date", names(df))] <- apply(df[,grepl("Date", names(df))], 2, 
                      function(x) format(parse_date_time(x, orders = my_formats), "%d-%m-%Y"))