I have a large (450MB / 250 million rows) flat file of 1s and 0s that looks like this...
1
0
0
1
0
1
0
etc...
I am using the following method to read it into R...
dat <- as.numeric(readLines("my_large_file"))
I am getting the desired data structure but it takes a long time. Any suggestions for a quicker method to achieve the same result?
NB. The order of the 1s and 0s is important to conserve. I would consider options in either python of unix command line but the final data structure is required in R for plotting a graph.
You might do better with scan
for numeric files where you just want a vector returned.
scan("my_large_file", what = integer())
The what
argument will speed up the reading of your file even more (as opposed to leaving it out), since you are effectively telling R that it will be reading integer values. scan
also has many other arguments that come in handy with large numeric files (e.g. skip
, nlines
, etc.)
In addition, as mentioned by @baptiste in the comments,
library(data.table)
fread("my_large_file")
blows both readLines
and scan
away (on my machine).
NOTE: Probably a typo, but in your original post, I think readlines
should be readLines