Search code examples
rdatetimelubridatereadr

How to convert multiple character variables to datetime at once?


Trying to convert multiple character variables to datetimes. Simplified example:

#create df/tibble with two "datetime" columns still as character 
df=tibble(date1=c("2013-11-26 00:10:12.536","2013-11-26 23:04:32.512","2014-02-19 23:34:44.459"),
          date2=c("2013-11-26 07:06:40.720","2013-11-27 07:09:50.552","2014-02-20 08:00:03.975"))

datetimeFormat="%Y-%m-%d %H:%M:%OS"

#OK: converting a single var using $
df_temp=df
df_temp$date1=as_datetime(df_temp$date1,format = datetimeFormat)

#not OK: converting a single var using indexing (presumably because df_temp[,"date1"] is still a tibble)
df_temp=df
df_temp[,"date1"]=as_datetime(df_temp[,"date1"],format = datetimeFormat)

#also not OK: converting multiple variables in one go
datetimeVars=c("date1","date2")
df_temp=df
df_temp[,datetimeVars]=as_datetime(df_temp[,datetimeVars],format = datetimeFormat)

How can I convert multiple character columns to datetime at once, specifically using a variable containing the variable names (like datetimeVars above)?

Some context:

  • my source csv files are not uniform and contain a variable - often large - number of datetimes-to-be (in custom format as example above). I can determine which variables should become datetime from their names
  • read_csv does not consistently recognize the relevant variables as datetime
  • read_csv does not appear to allow setting variable type for multiple variables at once, so can't do something like: df=read_csv("myFile.csv",col_types=cols(datetimeVars=col_datetime(format=datetimeFormat))) I also can't specify/hardcode variable type for each relevant variable like cols(date1=col_datetime(),date2=col_datettime, date3=...) because the number of datetime variables isn't known ahead of time

So currently stuck at both the levels of import (read_csv) and conversion (as_datetime). Suggestions welcome.


Solution

  • Tackling the conversion part since the import highly depends on the files and the included formats.

    Convert to date class using as.POSIXct (keep in mind that the date class is always shown in the format printed but keeps more info in the class object - see Reading below).

    library(dplyr)
    
    datetimeVars <- c("date1", "date2")
    
    df_date <- df %>% 
      summarise(across(all_of(datetimeVars), as.POSIXct))
    df_date
    # A tibble: 3 × 2
      date1               date2              
      <dttm>              <dttm>             
    1 2013-11-26 00:10:12 2013-11-26 07:06:40
    2 2013-11-26 23:04:32 2013-11-27 07:09:50
    3 2014-02-19 23:34:44 2014-02-20 08:00:03
    

    or with column names matching a starting pattern (starts_with())

    datetimeVars <- c("date")
    
    df_date <- df %>% 
      summarise(across(starts_with(datetimeVars), as.POSIXct))
    df_date
    # A tibble: 3 × 2
      date1               date2              
      <dttm>              <dttm>             
    1 2013-11-26 00:10:12 2013-11-26 07:06:40
    2 2013-11-26 23:04:32 2013-11-27 07:09:50
    3 2014-02-19 23:34:44 2014-02-20 08:00:03
    

    Reading your desired format from date class with strftime

    df_date %>% 
      summarise(across(starts_with("date"), strftime, format="%Y-%m-%d %H:%M:%OS3"))
    # A tibble: 3 × 2
      date1                   date2                  
      <chr>                   <chr>                  
    1 2013-11-26 00:10:12.536 2013-11-26 07:06:40.720
    2 2013-11-26 23:04:32.512 2013-11-27 07:09:50.552
    3 2014-02-19 23:34:44.459 2014-02-20 08:00:03.974