Search code examples
rsubset

R - cut a specific column from multiple files and bind them altogether


I have multiple files (30, tab delimited) that look like the one below:

|target_id | length| eff_length| est_counts| tpm| |:------------|------:|----------:|----------:|--------:| |LmjF.27.1250 | 966| 823.427| 2932| 94.7314| |LmjF.09.0430 | 1410| 1267.430| 3603| 75.6304| |LmjF.13.0210 | 2001| 1858.430| 4435| 63.4897| |LmjF.28.0530 | 4083| 3940.430| 7032| 47.4778| |LmjF.16.1400 | 591| 448.577| 1163| 68.9761| |LmjF.29.2570 | 1506| 1363.430| 11135| 217.2770|

I am trying to cut the fifth column from all of these files 30 files with a command such as:

fifth_colum_file1 = file1.csv[ , 5]

But I want to make the process more automatised.

The files that I want to work with have all the pattern "bs_abundance", therefore I think a good starting point would be to either load all the files I want to work with with such a command:

temp = list.files(pattern="*bs_abundance")

Or perhaps I can also load all the tables I want to work with directly into the working space already:

for(i in temp) { x <- read.table(i, header=TRUE, comment.char = "A", sep="\t") assign(i,x)
}

Then, as explained, I want to cut the fifth column of each of the files to later bind them all to another table of same number of rows.


Solution

  • Here is a method using lapply that assumes each file in the folder has the same number of rows.

    # get file names
    files <- dir("temp")
    # remove one file
    files <- files[-which(files == "removeFileName")]
    # get list of vectors from 29 files
    myList <- lapply(files, function(i) {temp <- read.csv(i); temp[, 5]})
    # get new data.frame
    dfDone <- do.call(data.frame, myList)