Search code examples
matlabtextimportcellcell-array

extracting numeric data from text in data files in Matlab


I have a .txt data file which has a few rows of text comments in the beginning, followed by the columns of actual data. It looks something like this:

lens (mm): 150
Power (uW): 24.4
Inner circle: 56x56
Outer Square: 256x320
remarks: this run looks good            
2.450000E+1 6.802972E+7 1.086084E+6 1.055582E-5 1.012060E+0 1.036552E+0
2.400000E+1 6.866599E+7 1.088730E+6 1.055617E-5 1.021491E+0 1.039043E+0
2.350000E+1 6.858724E+7 1.086425E+6 1.055993E-5 1.019957E+0 1.036474E+0
2.300000E+1 6.848760E+7 1.084434E+6 1.056495E-5 1.017992E+0 1.034084E+0

By using importdata, Matlab automatically separates the text data and the actual data . But how do I extract those numeric data from the text (which is stored in cells format)? What I want to do to achieve:

  1. extract those numbers (e.g. 150, 24.4)
  2. If possible, extract the names ('lens', 'Power')
  3. If possible, extract the units ('mm', 'uW')

1 is the most important and 2 or 3 is optional. I am also happy to change the format of the text comments if that simplifies the codes.


Solution

  • Let's say your sample data is saved as demo.txt, you can do the following:

    function q47203382
    %% Reading from file:
    COMMENT_ROWS = 5;
    % Read info rows:
    fid = fopen('demo.txt','r'); % open for reading
    txt = textscan(fid,'%s',COMMENT_ROWS,'delimiter', '\n'); txt = txt{1};
    fclose(fid);
    % Read data rows:
    numData = dlmread('demo.txt',' ',COMMENT_ROWS,0);
    %% Processing:
    desc = cell(5,1);
    unit = cell(2,1);
    quant = cell(5,1);
    for ind1 = 1:numel(txt)
      if ind1 <= 2
        [desc{ind1}, unit{ind1}, quant{ind1}] = readWithUnit(txt{ind1});
      else
        [desc{ind1},             quant{ind1}] = readWOUnit(txt{ind1});
      end
    end
    %% Display:
    disp(desc);
    disp(unit);
    disp(quant);
    disp(mat2str(numData));
    end
    
    function [desc, unit, quant] = readWithUnit(str)
      tmp = strsplit(str,{' ','(',')',':'});
      [desc, unit, quant] = tmp{:};
    end
    
    function [desc, quant] = readWOUnit(str)
      tmp = strtrim(strsplit(str,': '));   
      [desc, quant] = tmp{:};
    end
    

    We read the data in two stages: textscan for the comment rows in the beginning, and dlmread for the following numeric data. Then, it's a matter of splitting the text in order to obtain the various bits of information.

    Here's the output of the above:

    >> q47203382
        'lens'
        'Power'
        'Inner circle'
        'Outer Square'
        'remarks'
    
        'mm'
        'uW'
    
        '150'
        '24.4'
        '56x56'
        '256x320'
        'this run looks good'
    
        [24.5 68029720 1086084 1.055582e-05 1.01206  1.036552;
         24   68665990 1088730 1.055617e-05 1.021491 1.039043;
         23.5 68587240 1086425 1.055993e-05 1.019957 1.036474;
         23   68487600 1084434 1.056495e-05 1.017992 1.034084]
    

    (I took the liberty to format the output a bit for easier viewing.)

    See also: str2double.