Search code examples
matlabcorruptionmat-file

check if MAT file is corrupt without load


I have a data set consisting of large number of .mat files. Each .mat file is of considerable size i.e. loading them is time-consuming. Unfortunately, some of them are corrupt and load('<name>') returns error on those files. I have implemented a try-catch routine to determine which files are corrupt. However, given the situation that only handful of them are corrupt, loading each file and checking if it is corrupt is time taking. Is there any way I can check the health of a .mat file without using load('<name>')?

I have been unsuccessful in finding such solution anywhere.


Solution

  • The matfile function is used to access variables in MAT-files, without loading them into memory. By changing your try-catch routine to use matfile instead of load, you reduce the overhead of loading the large files into the memory.

    As matfile appears to only issue a warning when reading a corrupt file, you'll have to check if this warning was issued. This can be done using lastwarn: clear lastwarn before calling matfile, and check if the warning was issued afterwards:

    lastwarn('');
    matfile(...);
    [~, warnId] = lastwarn;
    if strcmp(warnId, 'relevantWarningId')
        % File is corrupt
    end
    

    You will have to find out the relevant warning id first, by running the above code on a corrupt file, and saving the warnId.

    A more robust solution would be to calculate a checksum or hash (e.g. MD5) of the file upon creation, and comparing this checksum before reading the file.