Search code examples
rstringr

How to extract a part of the filename from the `read.table` command and define as a new column


I'm looking for a way to automatically extract a part of the file name and populate a column with it while reading-in that file into R. Here's an example:

library(stringr)
df <- read.table("[MYPATH]/Mon1_2016-10-06_@exyz66698.txt", 
                 comment.char = "", header = FALSE, quote="", sep="\t", stringsAsFactors = F, encoding = "UTF-8") %>%
  mutate(Rec = str_extract(myFileNames[1], ".*\\d"))

This however works only because I have prepared in advance the vector myFileNames with all relevant file names and happen to know that the file name Mon1_2016-10-06_@exyz66698.txt" is the first element in that vector.

How can I achieve the same result without this detour to an external vector?


Solution

  • As I suggested in my comment (untested code):

    readFileAndAddFileNameAsColumn <- function(fName) {
      read.table(
        fName, 
        comment.char = "", 
        header = FALSE, 
        quote="", 
        sep="\t", stringsAsFactors = F, encoding = "UTF-8"
      ) %>%
      mutate(Rec = str_extract(fName, ".*\\d"))
    }
    
    readFileAndAddFileNameAsColumn("[MYPATH]/Mon1_2016-10-06_@exyz66698.txt")
    

    Alternatively, you may be able to use the .id argument of bind_rows if you pass your myFileNamesList to lapply. Something like

    lapply(
      myFileNamesList,
      read.table,
      comment.car = "",
      header = FALSE,
      quote = "",
      sep = "\t",
      stringsAsFactors = FALSE,
      encoding = "UTF-8"
    ) %>%
    bind_rows(.id = "Rec")