Search code examples
rfreadrbindlist

rbindlist - how to get an additional column with info about a source?


I have more than 30 large .csv files stored in one folder. I would like to read them in R as one data.frame/data/table with the following criteria:

(1) first and last 25 rows of each file should be skipped (number of rows differs in each file)

(2) last column should contain unique information on the source of the row (eg. filename.csv.rownumber from the raw file). A number of columns differ in each file as well.

So far I have this:

ASC_files <- list.files(pattern="*.csv")

read_ASC <- function(x){
ASC <-fread(x, skip=25)
return(ASC[1:(nrow(ASC)-25),])
}

ASC_list <-lapply(ASC_files, read_ASC)
ASC_all <- rbindlist(ASC_list, use.names=TRUE)

However, I have no idea how to get an additional column with information on the source of each row...


Solution

  • Thanks everyone for commenting my question. Finally, I came out with this solution:

    ASC_files <- list.files(pattern="*.asc")
    ASC_all <- sapply(ASC_files, function(x) read.csv(x, header=FALSE, col.names
    paste0('V', 1:1268) , sep="", stringsAsFactors = FALSE))
    #adding a new column with name of the source file
    ASC_all <- mapply(cbind, ASC_all, "source"=ASC_files, SIMPLIFY = FALSE)
    #adding a new column with row number
    ASC_all <- map(ASC_all, ~rowid_to_column(.x, var="row"))
    #removing last and first 25 rows in each dataframe of the list
    ASC_all <- lapply(ASC_all, function(x) x[x$row<(nrow(x)-25),])
    ASC_all <- lapply(ASC_all, function(x) x[x$row>25,])
    #transforming the list into a dataframe with all data
    ASC_all <- rbindlist(ASC_all)
    #complementing the kolumn source with the row number (result: filename.csv.rownumber)
    ASC_all$file <- paste0(ASC_all$file, '.', ASC_all$row)
    #removing column with row numbers
    ASC_all$row <- NULL
    

    Maybe it's not the most elegant and efficient code but at least it works.