Search code examples
matlabperformancefile-iobit-manipulationbinaryfiles

Speedup processing of larger binary files


I have to process thousands of binary files (each of 16MB) by reading pairs of them and creating a bit-level data structure (usually a 1x134217728 array) in order to process them on bit level.

Currently I am doing this the following way:

conv = @(c) uint8(bitget(c,1:32));
measurement = NaN(1,(sizeOfMeasurements*8))   %(1,134217728)
fid = fopen(fileName, 'rb');
byteContent = fread(fid,'uint32');
fclose(fid);
bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false);
measurement=[bitRepresentation1{:}];

Thus, I replaced fopen with memmapfile as below:

m = memmapfile(fileName,'Format',{'uint32', [4194304 1], 'byteContent'});
byteContent = m.data.byteContent;
byteContent = double(byteContent);

I printed timing information (using tic/toc) for the individual instructions and it turns out that the bottleneck is:

bitRepresentation1 = arrayfun(conv, byteContent, 'UniformOutput', false);  % see first line of code for conv

Are there more efficient ways of transforming byteContent into an array that stores a bit per index (i.e. that is a bit representation of byteContent)?


Solution

  • Let looping over all numbers be handled by bitget. You loop over the bits:

    fid = fopen(fileName, 'rb');
    bitContent = fread(fid,'*ubit64');
    fclose(fid);
    
    conv = @(ii) uint8(bitget(bitContent, ii));
    bitRepresentation = arrayfun(conv, 1:64, 'UniformOutput', false);
    
    measurement = [bitRepresentation{:}]';
    measurement = measurement(:).';
    

    EDIT you can also try a direct loop:

    fid = fopen(fileName, 'rb');
    bitContent = fread(fid,'*ubit64');
    fclose(fid);
    
    sz = 64 * size(bitContent,1);    
    measurement3 = zeros(1, sz, 'uint8');
    weave = 1:64:sz;
    for ii = 1:64
        measurement3(weave + ii - 1) = uint8(bitget(bitContent, ii)); end
    

    but on my system, that is (surprisingly) slower than arrayfun...but, my MATLAB version is from the stone age, your mileage may be different. Give it a try