Search code examples
rbigdataflat-file

quick way to read a large flat file into r as.numeric


I have a large (450MB / 250 million rows) flat file of 1s and 0s that looks like this...

    1
    0
    0
    1
    0
    1
    0
    etc...

I am using the following method to read it into R...

dat <- as.numeric(readLines("my_large_file"))

I am getting the desired data structure but it takes a long time. Any suggestions for a quicker method to achieve the same result?

NB. The order of the 1s and 0s is important to conserve. I would consider options in either python of unix command line but the final data structure is required in R for plotting a graph.


Solution

  • You might do better with scan for numeric files where you just want a vector returned.

    scan("my_large_file", what = integer())
    

    The what argument will speed up the reading of your file even more (as opposed to leaving it out), since you are effectively telling R that it will be reading integer values. scan also has many other arguments that come in handy with large numeric files (e.g. skip, nlines, etc.)

    In addition, as mentioned by @baptiste in the comments,

    library(data.table)
    fread("my_large_file")
    

    blows both readLines and scan away (on my machine).

    NOTE: Probably a typo, but in your original post, I think readlines should be readLines