Search code examples
matlabcompressionfwrite

MATLAB fwrite overhead


I have a binary logical data, that I want to save to a file in least amount of space possible. When I check the data size from the MATLAB workspace it shows, 103 kb but when I save it using fwrite ubit1 it expands to 105 kb? What can I do to save it in the least possible space?


Solution

  • There is no overhead (or may be you meant metadata) added by the function fwrite in Matlab). The function is as "low level" as it comes and on a given machine it will give similar results than the equivalent low level functions in C, C++ and many more languages.

    To access the disk, they will all rely on even lower level functions, driven by the filesystem of your disk and your operating system. So between different disks, filesystems and OS you may observe small differences in the final result, but on a given system (disk/FS/OS), the Matlab fwrite is similar to every other language, and there are no "overhead".


    Now to the size of data versus size of file versus size on disk:

    Consider the following snippet:

    nbits = 376 ;
    A = true( nbits , 1 ) ;
    
    fid = fopen( 'testsize.bin' , 'w' ) ;
    fwrite( fid , A , 'ubit1' ) ;
    fclose(fid) ;
    

    This will create an array of 376 logical, then write them with the format ubit1 onto disk.

    Before we look at the file, notice that, as mentioned in Horchler comment, in Memory Matlab still uses a full byte (8 bits) for each logical (boolean).

    >> whos A
      Name        Size            Bytes  Class      Attributes
      A         376x1               376  logical
    

    This is not a problem however, since when fwrite will write on the disk, the format ubit1 will tell it to only use the (single) significant bit so as Horchler commented, the file will be exactly 1/8th of the size of the variable in memory...

    or will it ??

    If I just look quickly at my file explorer, ouch: explorer

    (This is all done on a PC, windows 8, NTFS file system.)

    1KB, naaaah, this is only because the thing is not designed to display sizes smaller than that, it's just rounded.(unix/linux user may get better display but hey I'm on windows I have to deal with it).

    To get better information, I have to query more details, so once I access the properties of the file, I get:
    fileproperties

    pfeeew. 47 bytes. That sounds about right. Let's see 376/8=47, yep that's perfect!

    Note the "size on disk" of a whopping 4KB. Why would you need so much space to store my poor 47 bytes ? Well that has to do with the "default allocation" size of the filesystem on your disk, and it's one of these things that fwrite cannot do anything about for example. It is only managed by the OS/file system.

    Now even if a lot of disk is being wasted, I still managed to get the information, my file is only actually 47 bytes. So success? ... not yet.

    I choose 376 bits at the beginning almost at random, but also because it was a perfect multiple of 8. Now let's try to run the very same code than above, except we'll start with:

    nbits = 377 ;
    

    The code runs fine. The file still appears as 1KB in explorer but we know it's false, the property now shows:
    fileprop2

    377/8 = 47.125, not 48, so is it "rounded" by the explorer again. NO!

    The file size is actually 48 byte (not one bit less or more). (but the useful information inside the file only occupy 47 byte and 1 bit, the last 7 bits are undetermined (or pegged to '0' may be but don't be sure).

    What happened behind the scene is that fwrite was aggregating my bits to write by group of 8, building a full byte, then only writing the full byte on disk (or sometimes bigger groups even). It does all that behind the scene, but it has to because the filesystem (yes him again) will not let him address individual bit on the disk. The filesystem expect packets of at least a byte (or more). So when reaching the last single bit to write, fwrite had to pad that with 7 other bits before telling the file system to write that on disk.

    I am not expert on all the flavours of filesystem, but I strongly doubt that many will allow you to address a single bit, so the minimum rounding that you should expect will always be at least a byte ... if not more.

    Summary

    fwrite does not introduce overhead, or only the one it is forced to do so by the hardware and filesystem (in which case any other function could not do better).