I need to read a csv file into main memory and I would like to know the fastest programming language for doing that. The file contains a time series:
time, value
1366810163,177.413
1366810164,177.303
1366810165,177.413
1366810166,178.9797
I want to evaluate the I/O Performance improvements on compressing the data as it is already done here: http://entland.homelinux.com/blog/2006/10/25/reading-files-as-fas-as-possible/ This blog is from 2006 and only targeting C++ programming language. But I also want to evaluate the I/O costs for Decompression.
So you could help me with your experience in any programming language / operating system. Then I will sum up your answers and make a guide. Thank you for your help!
C or C++ with zlib will be the fastest, if written properly. (Assembler could be faster still, though for large programs its getting harder to beat good compilers.)
zlib's gz* functions will read a file that is compressed with gzip or not transparently. It is usually faster to read less data from the mass storage device and decompress, than it is to read more uncompressed data from the mass storage device. Even with an SSD.
On my 2 GHz i7, I can read in and parse a 56.2 MiB CSV file with 201429 records of 24 fields each in about 0.3 seconds of CPU time if uncompressed, 0.4 seconds if compressed. In real time after memory buffers have been purged, reading from an SSD, it's 0.5 seconds if compressed, 0.6 seconds if not compressed. (Note the reversal between CPU time and real time.)