Search code examples
matlabstring-formattingnantextscan

Matlab textscan introducing additional rows with zeros or NaNs


I'm trying to read a .dat file containing tens of thousands of rows, where each of them looks something like:

   1.9681968    0   0   19.996  0   61  100 1.94E-07    6.62E-07  
   2.330233     0   0   19.996  0   61  100 1.94E-07    6.62E-07
   2.6512651    0   0   19.997  0   61  100 1.94E-07    6.62E-07
   3.5923592    0   0   19.998  0   61  100 1.96E-07    6.62E-07

Now for example, I'm trying to read it with

    Data = textscan(fid, %.9f%*f%*f%.9f%*f%*f%*f%.9f)

where the string format depends on which column I want to read.

When reading big files, the first column of the cell array 'Data' will become

    1.96819680000000
    0
    2.33023300000000
    2.65126510000000
    0
    3.59235920000000
    0

and the rest of the columns will show NaNs instead of the zeros. The additional rows are almost as many as the rows in the data file, thus I get arrays that are almost a factor 2 larger.

I guess this has something to do with errors when reading doubles, since this problem doesn't occur if I try to read the file as strings.

But if possible, I would like to not read everything as strings and the have to convert everything to doubles.

Any ideas?


Solution

  • I think the issue is with the format string. Try the format string shown below.

    fid = fopen('test.txt'); 
    % data = textscan(fid, '%.9f%*f%*f%.9f%*f%*f%*f%.9f')
    data = textscan(fid, '%f %f %f %f %f %f %f %f %f');
    data = cell2mat(data)
    fclose(fid);
    

    Where test.txt is a text file containing your given example data. The above code gives the following output.

    1.9682         0         0   19.9960         0   61.0000  100.0000    0.0000       NaN
    2.3302         0         0   19.9960         0   61.0000  100.0000    0.0000    0.0000
    2.6513         0         0   19.9970         0   61.0000  100.0000    0.0000    0.0000
    3.5924         0         0   19.9980         0   61.0000  100.0000    0.0000    0.0000
    

    Notice the NaN value when the text only contained eight values. If you want to specify a default value for when lines contain less values use the EmptyValue setting:

    data = textscan(fid, '%f %f %f %f %f %f %f %f %f','EmptyValue', 42);
    

    Then you will get:

    1.9682         0         0   19.9960         0   61.0000  100.0000    0.0000   42.0000
    2.3302         0         0   19.9960         0   61.0000  100.0000    0.0000    0.0000
    2.6513         0         0   19.9970         0   61.0000  100.0000    0.0000    0.0000
    3.5924         0         0   19.9980         0   61.0000  100.0000    0.0000    0.0000
    

    You can then get the first column by indexing the resulting matrix like this data(:,1) which outputs the following:

    1.9682
    2.3302
    2.6513
    3.5924