Search code examples
rfilenamesrbindread.csv

Read multiple .txt files and add new column identifying file name in R


I have 1500+ .txt files called data_{date from 2015070918 to today} all with 7 columns worth of data and variable row amounts. I have managed to use the following code to extract and merge the data into one table:

files = list.files(pattern = ".txt")
myData <- lapply(files, function(x) {
tryCatch(read.table(x, header = F, sep = ','), error=function(e) NULL)
})

Note: there are no headers on the columns, currently I don't even know which variable is which!

At the moment the data only has the date in the file name and therefore it isn't possible to distinguish between each subset of daily data. I want to create an additional column to include the date which I can extract if I can include the filename in an additional column.

I searched on stackexchange and came across this possible solution: Importing multiple .csv files into R and adding a new column with file name

df <- do.call(rbind, lapply(files, function(x) cbind(read.csv(x, header = F, sep = ","), name=strsplit(x,'\\.')[[1]][1])))

However I get the following error:

 Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
 no lines available in input 

I have used read.csv on individual files and they have imported without any issues. Any ideas to resolve this would be greatly appreciated!


Solution

  • This should work, if your read.table command is correct:

    myData_list <- lapply(files, function(x) {
      out <- tryCatch(read.table(x, header = F, sep = ','), error = function(e) NULL)
      if (!is.null(out)) {
        out$source_file <- x
      }
      return(out)
    })
    
    myData <- data.table::rbindlist(myData_list)
    

    In the past I found that you can spare yourself a lot of headache using data.table::fread instead of read.table. So you could consider this:

    myData_list <- lapply(files, function(x) {
      out <- data.table::fread(x, header = FALSE)
      out$source_file <- x
      return(out)
    })
    
    myData <- data.table::rbindlist(myData_list)
    

    You can add the tryCatch part back if necessary. Depending on how the files vector looks, basename() might be interesting to use on the column source_file.