I am writing a simple command-line Rscript that reads some binary data and outputs it to as a stream of numeric characters. The data is of specific format and R has a very quick library to deal with the binary files in question. The file (of 7 million characters) is read quickly - in less than a second:
library(affyio)
system.time(CEL <- read.celfile("testCEL.CEL"))
user system elapsed
0.462 0.035 0.498
I want to write a part of read data to stdout:
str(CEL$INTENSITY$MEAN)
num [1:6553600] 6955 225 7173 182 148 ...
As you can see it's numeric data with ~6.5 million integers.
And the writing is terribly slow:
system.time(write(CEL$INTENSITY$MEAN, file="TEST.out"))
user system elapsed
8.953 10.739 19.694
(Here the writing is done to a file, but doing it to standard output from Rscript takes the same amount of time"
cat(vector)
does not improve the speed at all. One improvement I found is this:
system.time(writeLines(as.character(CEL$INTENSITY$MEAN), "TEST.out"))
user system elapsed
6.282 0.016 6.298
It is still a far cry from the speed it got when reading the data in (and it read 5 times more data than this particular vector). Moreover I have an overhead of transforming the entire vector to character before I can proceed. Plus when sinking to stdout I cannot terminate the stream with CTRL+C if by accident I fail to redirect it to file.
So my question is - is there a faster way to simply output numeric vector from R to stdout?
Also why is reading the data in so much faster than writing? And this is not only for binary files, but in general:
system.time(tmp <- scan("TEST.out"))
Read 6553600 items
user system elapsed
1.216 0.028 1.245
Binary reads are fast. Printing to stdout is slow for two reasons:
You can benchmark / profile either. But if you really want to be "fast", stay away from formatting for printing lots of data.
Compiled code can help make the conversion faster. But again, the fastest solution will to