Search code examples
matlabbinaryfwrite

write multiprecision binary file with matlab


I would like to write a table with 1 integer followed by 3 doubles in binary format. Of course I can just do a for loop

for i=1:sz
  fwrite(fid, integ(i), 'int');
  fwrite(fid, doubl(i,:), 'double');
end

but this is quite slow for arrays with a few million entries. What is the most efficient way to handle this (without having to write a .mex)?

Unfortunately I must keep this [int32 float64 float64 float64] format, since this is a file format specification used by a program.


Solution

  • Edit: So finally the fastest way to do it and respect the exact order and type of variable is to convert the table of nx3 double into an array of (2xn)x3 int32, reshape and concatenate the arrays then write everything in one go.

    outfile4 = 'test1.bin' ;
    tic4 = tic ;
    
    % // reshape everything
    table2write = int32(zeros(2*nCol+1,nPt)) ;
    table2write(1,:) = integ.' ; %'
    for k=1:nCol
       ixLine = (k-1)*2+2 ; 
       table2write( ixLine:ixLine+1 , : ) = reshape( typecast(doubl(:,k),'int32') , 2 , [] ) ;
    end
    % // write
    fid = fopen( outfile4 , 'w' ) ;
    count = fwrite(fid , table2write , 'int32' ) ;
    fclose( fid ) ;
    elapsed4 = toc(tic4)
    

    Which result in:

    elapsed4 =
       0.794346687070910
    

    read below to see the definition of test variables and slightly faster method but which deform the array


    original answer:
    If you can afford to reorganize your file, you can gain a tremendous amount of time.

    Consider the following example:

    outfile1 = 'E:\TEMP\Z_ToDelete\test1.bin' ;
    outfile2 = 'E:\TEMP\Z_ToDelete\test2.bin' ;
    
    nPt = 0.5e6 ;
    integ = int32( randi(32000,nPt,1) ) ;
    doubl = rand(nPt,3) ;
    
    %% // Write to file with mixed precision
    tic1 = tic ;
    fid = fopen( outfile1 , 'w' ) ;
    for k = 1:nPt
      fwrite(fid, integ(k), 'int');
      fwrite(fid, doubl(k,:), 'double');
    end
    fclose( fid ) ;
    elapsed1 = toc(tic1)
    
    %% // write to file sequentially
    tic2 = tic ;
    fid = fopen( outfile2 , 'w' ) ;
    fwrite(fid, integ, 'int');
    fwrite(fid, doubl, 'double');
    fclose( fid ) ;
    elapsed2 = toc(tic2)
    

    On my system, this output:

    elapsed1 =
              19.7780466501241
    elapsed2 =
            0.0309073378234669
    

    So letting Matlab handle the writing of your full arrays, one precision at a time is extremely more efficient than specifying line by line what to write.

    The downside is the reading of one single record from your saved file may be a little more complex, but you can easily write a function which for a given index will go to read the integer, skip the rest of them, then read the 3xdoubles.


    If you really cannot afford to multiplex/demultiplex your data, then you can consider converting your int to double and writing the full array:

    tic3 = tic ;
    A = [double(integ) doubl] ;
    fid = fopen( outfile2 , 'w' ) ;
    fwrite(fid, A, 'double');
    fclose( fid ) ;
    elapsed3 = toc(tic3)
    

    This is still a lot faster than the initial "mixed precision" solution

    elapsed3 =
             0.483094789081886
    

    It will take you less time to convert them back to integer when you read them than you spent writing mixed precision values. The only downside of this method is a slight increase in file size (~ about 14%).