Search code examples
rdata-processingread.tabledelimited

Ways to read only select columns from a file into R? (A happy medium between `read.table` and `scan`?)


I have some very big delimited data files and I want to process only certain columns in R without taking the time and memory to create a data.frame for the whole file.

The only options I know of are read.table which is very wasteful when I only want a couple of columns or scan which seems too low level for what I want.

Is there a better option, either with pure R or perhaps calling out to some other shell script to do the column extraction and then using scan or read.table on it's output? (Which leads to the question how to call a shell script and capture its output in R?).


Solution

  • Sometimes I do something like this when I have the data in a tab-delimited file:

    df <- read.table(pipe("cut -f1,5,28 myFile.txt"))
    

    That lets cut do the data selection, which it can do without using much memory at all.

    See Only read limited number of columns for pure R version, using "NULL" in the colClasses argument to read.table.