Search code examples
rdplyrreadr

How to load multiple files from a folder and use part of filename as column in dataset


I am slowly fumbling my way around R and learning lots thanks to forums like this and blogs. I have found a handy piece of code (below) to solve part of a new problem but now I am stuck.

library(readr)
library(dplyr)

myFiles <- list.files(path = "C:/Desktop/M2P/", pattern = "*.txt", full.names = FALSE)
myTable <- sapply(myFiles, read_csv, simplify=FALSE) %>% 
  bind_rows(.id = "id")

All of the filenames in the path are like this: 'YYYYMMDD_SUMMARY.txt'

The file contains a number of columns separated by ","

The code above adds a new column ("id") to the table with the exact filename that was loaded, along with all of my data in columns and this is great, however ...

I would like to adjust this so that I get a column added which is just the date part of the filename, that is, YYYY-MM-DD. I want to use this date later to drive some functionality and to group the data.

is this possible?


Solution

  • Add a mutate statement to get the date from the file names.

    library(dplyr)
    library(readr)
    
    sapply(myFiles, read_csv, simplify=FALSE) %>% 
       bind_rows(.id = "id") %>%
       mutate(id = sub('(\\d+).*', '\\1', id))
       #If you need as date object
       #mutate(id = lubridate::ymd(sub('(\\d+).*', '\\1', id)))