Robust reading of a tabulated data file with Matlab

I'm trying to access the numerical data in a text file that is produced by a measurement device. The files are a mix of strings and numbers, and items are separated by tabs. It is not rectangular, which makes them not suitable for matlab's file loading functions, such as dlmread. I can convert the files to exel spreadsheets and load them with xlsread, but as I am dealing with hundreds of files, it is not an option for me.

Here is what the files look like:

Heading String  
Commenttext 
X Axis  Trace A Trace B 
Frequency   Funct Ch1   Funct Ch2   
Frequency / Hz  Funct Ch1 / dBr Funct Ch2 / dBV 
Hz  dBr dBV 
LOG LIN LIN 
100 -60.35  -71.1446    
10000   -7.48178    -12.0321    
VOID    VOID    VOID    
VOID    VOID    VOID    
Scan #-5    Labeltext   Labeltext   
Hz  V   V   
1.000000000000e+003 3.999999910593e-002 1.000000000000e+000 
A-X/37   A-Y     A-LimLow    A-LimUpp   A-RefY in V     B-X/37   B-Y    B-LimLow    B-LimUpp    B-RefY in V     
1.000000000000e+002 -4.873095199691e+001    VOID    VOID    3.999999910593e-002 1.000000000000e+002 -2.026775796775e+001    VOID    VOID    1.000000000000e+000 
1.172102297533e+002 -4.492478734843e+001    VOID    VOID    3.999999910593e-002 1.172102297533e+002 -2.024411835772e+001    VOID    VOID    1.000000000000e+000 
1.373823795883e+002 -3.994765661259e+001    VOID    VOID    3.999999910593e-002 1.373823795883e+002 -2.022767912575e+001    VOID    VOID    1.000000000000e+000 
1.610262027561e+002 -3.628116388971e+001    VOID    VOID    3.999999910593e-002 1.610262027561e+002 -2.021939551014e+001    VOID    VOID    1.000000000000e+000 
1.887391822135e+002 -3.285059881019e+001    VOID    VOID    3.999999910593e-002 1.887391822135e+002 -2.021928320409e+001    VOID    VOID    1.000000000000e+000 
2.212216291070e+002 -2.987476652701e+001    VOID    VOID    3.999999910593e-002 2.212216291070e+002 -2.022570411546e+001    VOID    VOID    1.000000000000e+000 
2.500000000000e+002 -2.751190343935e+001    VOID    VOID    3.999999910593e-002 2.500000000000e+002 -2.023346726318e+001    VOID    VOID    1.000000000000e+000 
2.592943797405e+002 -2.728162367715e+001    VOID    VOID    3.999999910593e-002 2.592943797405e+002 -2.023601503648e+001    VOID    VOID    1.000000000000e+000

etc.

Note that: - The value in front of Scan #-5 tells me how many repetitions there are in the file - A-X/37 tells me how many lines per scan there are. - The 'VOID' elements are replaced by actual numbers when applicable, and vice versa.

I would like to be able to extract the values that are present, and that my script is not dependent on the size of the traces, the number of repetitions or whether the values are numbers or 'VOID'.

So far, based on the tips given in other treads, I was able to create an array with the rows of the file and extract the 2 first numbers of each row like this:

fid=fopen('data.txt');
Rows = textscan(fid,'%s', 'delimiter','\n'); %Creates a temporary array with the rows of the file
fclose(fid);
TraceStarts=strfind(Rows{1,1},'Scan'); %Looks for the start of each trace..
TracesIdx = find(~cellfun('isempty', TraceStarts)); %.. and stores the indexes.
Traces= cellfun(@(x) textscan(x,'%f','delimiter','\t', 'CollectOutput',1), Rows{1,1});

Traces is then a cell containing arrays for each row with at max the two first numbers per row. Why it stops searching for float items in a row after it found 2 is a mystery to me... I don't want to specify the sequence of numbers and strings in each row, because they might vary.

Any help would be much appreciated.

Solution

Your textscan is only reporting two elements due to the VOID elements. Simply tell textscan to treat those as 'empty' so it reads them as NaN. See the TreatAsEmpty name-value pair in the textscan documentation.

>> str = '1.000000000000e+002 -4.873095199691e+001    VOID    VOID    3.999999910593e-002 1.000000000000e+002 -2.026775796775e+001    VOID    VOID    1.000000000000e+000';
>> textscan(str,'%f','delimiter','\t', 'CollectOutput',1)
ans = 
    [2x1 double]
>> textscan(str,'%f','delimiter','\t', 'CollectOutput',1,'treatAsEmpty',{'VOID'})
ans = 
     [10x1 double]
>> ans{1}
ans =
                       100
           -48.73095199691
                       NaN
                       NaN
          0.03999999910593
                       100
           -20.26775796775
                       NaN
                       NaN
                         1