Search code examples
rloopsposix

Convert identical column of multiple dataframes


I have a number of dataframes in the global environment, let's call them a, b, and c .

Each of the dataframes has a column named start_time which needs to be converted into posix class, but I am looking for way to do this without writing out the same code for each dataframe. The code is:

 a$start_time <- strptime(a$start_time, format = '%Y-%m-%d %H:%M:%S')

That would only convert the start_time in a

Using the dataframe names, how could one devise a way to loop over each of the dataframes and convert start_time to posix?

This attempt with lapply only works on the first dataframe...

ll <- list(a, b, c)
lapply(ll,function(df){
  df$start_time <- strptime(df$start_time, format = '%Y-%m-%d %H:%M:%S')         

})

Solution

  • Data: df1, df2, df3

    df1 <- data.frame(start_time = seq(Sys.time(), Sys.time() + 100, 10))    
    df2 <- data.frame(start_time = seq(Sys.time(), Sys.time() + 100, 10))    
    df3 <- data.frame(start_time = seq(Sys.time(), Sys.time() + 100, 10))
    
    # create a vector with names of the data frames   
    data_vec <- c('df1', 'df2', 'df3')
    
    # loop through the data_vec and modify the start_time column
    a1 <- lapply(data_vec, function( x ) {
      x <- get( x )
      x <- within(x, start_time <- strptime(start_time, format = '%Y-%m-%d %H:%M:%S') )
      return( x )
      })
    
    # assign names to the modified data in a1
    names(a1) <- data_vec
    
    # list objects in global environment
    ls()
    # [1] "a1"       "data_vec" "df1"      "df2"      "df3" 
    
    # remove df1, df2, df3 from global environment
    rm(list = c('df1', 'df2', 'df3') )
    
    # confirm the removal of data
    ls()
    # [1] "a1"       "data_vec"
    
    # assign the named list in a1 as data in global environment
    list2env(a1, envir = .GlobalEnv)
    
    # list objects in global environment and confirm that the data appeared again
    ls()
    # [1] "a1"       "data_vec" "df1"      "df2"      "df3"     
    
    # output
    head(df1)
    #            start_time
    # 1 2017-03-03 22:49:54
    # 2 2017-03-03 22:50:04
    # 3 2017-03-03 22:50:14
    # 4 2017-03-03 22:50:24
    # 5 2017-03-03 22:50:34
    # 6 2017-03-03 22:50:44
    
    head(df2)
    #            start_time
    # 1 2017-03-03 22:49:54
    # 2 2017-03-03 22:50:04
    # 3 2017-03-03 22:50:14
    # 4 2017-03-03 22:50:24
    # 5 2017-03-03 22:50:34
    # 6 2017-03-03 22:50:44