Search code examples
rrasterfilesizeread.csv

Why is raster filesize is so much different than objectsize?


I have a 1.2 GB .csv file on my disk. I use R's filename = read.csv(path)-function and then I check the object size via object.size(filename) and it turns out, that it's 3721MB large. Why is this difference?


Solution

  • A CSV file is a plain text file and might look like this:

    1,2,3,4
    3,2,3,2
    3,4,2,1
    

    each character (ie digit and comma) is a byte. This file is 24 bytes big (there's an invisible "new line" character at the end of each row).

    When read into R each number is stored as a floating point decimal number, which is 8 bytes. The file above would then be 8*24 (values) = 96 bytes big.

    It can go the other way. If the above file was instead written:

    1.0000000000, 2.0000000000, 3.00000000000, 4.000000000
    [etc]
    

    then in the CSV each number is taking about 12 bytes - each digit, decimal point, command and zero takes a byte - and when read in to R would still only take 8 bytes as floating point decimal values.