I have a data set consisting of large number of .mat
files. Each .mat
file is of considerable size i.e. loading them is time-consuming. Unfortunately, some of them are corrupt and load('<name>')
returns error on those files. I have implemented a try-catch routine to determine which files are corrupt. However, given the situation that only handful of them are corrupt, loading each file and checking if it is corrupt is time taking. Is there any way I can check the health of a .mat
file without using load('<name>')
?
I have been unsuccessful in finding such solution anywhere.
The matfile
function is used to access variables in MAT-files, without loading them into memory. By changing your try-catch routine to use matfile
instead of load
, you reduce the overhead of loading the large files into the memory.
As matfile
appears to only issue a warning when reading a corrupt file, you'll have to check if this warning was issued. This can be done using lastwarn
: clear lastwarn
before calling matfile
, and check if the warning was issued afterwards:
lastwarn('');
matfile(...);
[~, warnId] = lastwarn;
if strcmp(warnId, 'relevantWarningId')
% File is corrupt
end
You will have to find out the relevant warning id first, by running the above code on a corrupt file, and saving the warnId
.
A more robust solution would be to calculate a checksum or hash (e.g. MD5) of the file upon creation, and comparing this checksum before reading the file.