I have a 1.2 GB .csv file on my disk. I use R
's filename = read.csv(path)
-function and then I check the object size via object.size(filename)
and it turns out, that it's 3721MB large. Why is this difference?
A CSV file is a plain text file and might look like this:
1,2,3,4
3,2,3,2
3,4,2,1
each character (ie digit and comma) is a byte. This file is 24 bytes big (there's an invisible "new line" character at the end of each row).
When read into R each number is stored as a floating point decimal number, which is 8 bytes. The file above would then be 8*24 (values) = 96 bytes big.
It can go the other way. If the above file was instead written:
1.0000000000, 2.0000000000, 3.00000000000, 4.000000000
[etc]
then in the CSV each number is taking about 12 bytes - each digit, decimal point, command and zero takes a byte - and when read in to R would still only take 8 bytes as floating point decimal values.