I would like to write a table with 1 integer followed by 3 doubles in binary format. Of course I can just do a for loop
for i=1:sz
fwrite(fid, integ(i), 'int');
fwrite(fid, doubl(i,:), 'double');
end
but this is quite slow for arrays with a few million entries. What is the most efficient way to handle this (without having to write a .mex)?
Unfortunately I must keep this [int32 float64 float64 float64] format, since this is a file format specification used by a program.
Edit:
So finally the fastest way to do it and respect the exact order and type of variable is to convert the table of nx3 double
into an array of (2xn)x3 int32
, reshape and concatenate the arrays then write everything in one go.
outfile4 = 'test1.bin' ;
tic4 = tic ;
% // reshape everything
table2write = int32(zeros(2*nCol+1,nPt)) ;
table2write(1,:) = integ.' ; %'
for k=1:nCol
ixLine = (k-1)*2+2 ;
table2write( ixLine:ixLine+1 , : ) = reshape( typecast(doubl(:,k),'int32') , 2 , [] ) ;
end
% // write
fid = fopen( outfile4 , 'w' ) ;
count = fwrite(fid , table2write , 'int32' ) ;
fclose( fid ) ;
elapsed4 = toc(tic4)
Which result in:
elapsed4 =
0.794346687070910
read below to see the definition of test variables and slightly faster method but which deform the array
original answer:
If you can afford to reorganize your file, you can gain a tremendous amount of time.
Consider the following example:
outfile1 = 'E:\TEMP\Z_ToDelete\test1.bin' ;
outfile2 = 'E:\TEMP\Z_ToDelete\test2.bin' ;
nPt = 0.5e6 ;
integ = int32( randi(32000,nPt,1) ) ;
doubl = rand(nPt,3) ;
%% // Write to file with mixed precision
tic1 = tic ;
fid = fopen( outfile1 , 'w' ) ;
for k = 1:nPt
fwrite(fid, integ(k), 'int');
fwrite(fid, doubl(k,:), 'double');
end
fclose( fid ) ;
elapsed1 = toc(tic1)
%% // write to file sequentially
tic2 = tic ;
fid = fopen( outfile2 , 'w' ) ;
fwrite(fid, integ, 'int');
fwrite(fid, doubl, 'double');
fclose( fid ) ;
elapsed2 = toc(tic2)
On my system, this output:
elapsed1 =
19.7780466501241
elapsed2 =
0.0309073378234669
So letting Matlab handle the writing of your full arrays, one precision at a time is extremely more efficient than specifying line by line what to write.
The downside is the reading of one single record from your saved file may be a little more complex, but you can easily write a function which for a given index will go to read the integer
, skip the rest of them, then read the 3xdoubles
.
If you really cannot afford to multiplex/demultiplex your data, then you can consider converting your int
to double
and writing the full array:
tic3 = tic ;
A = [double(integ) doubl] ;
fid = fopen( outfile2 , 'w' ) ;
fwrite(fid, A, 'double');
fclose( fid ) ;
elapsed3 = toc(tic3)
This is still a lot faster than the initial "mixed precision" solution
elapsed3 =
0.483094789081886
It will take you less time to convert them back to integer when you read them than you spent writing mixed precision values. The only downside of this method is a slight increase in file size (~ about 14%).