Search code examples
matlabtextscan

Conditional text import or import by header name - MATLAB


Is there a way to perform conditional text import within MATLAB? e.g. with a tab-delimited .txt file in this format:

Type    A   B   C   D   E
 A    5000  2   5   16  19
 A    5000  3   4   5   4
 A    5000  4   1   4   5
 B    500   19  8   2   7
 B    500   18  9   8   1
 B    500   2   9   13  2
 B    100   3   10  15  9
 B    5000  4   15  14  10

Is there a method to import only those lines where Column A contains '5000'?

This is preferential over importing the entire .txt file and separating the data afterward as in reality, my text files are rather large (~200MB each) - but if there is a way to do this quickly, that would also be a suitable solution.

Alternatively, is there a method (similar to R) where you can import and handle data using the headers contained in the .txt file? e.g. importing 'Type' 'A' 'B' and 'D' whilst ignoring 'C' and 'E' in the above example. This is needed if the input file is flexible in format with additional columns added sometimes meaning their relative positions change.


Solution

  • You might try reading the input file line by line, check if the line contains the reference value (5000 in this case) in the reference column (column 2 in this case).

    If so you can store the input, otherwise, you discard it.

    In the following code, based on your template, you can define the reference value and the reference column at the beginning of the code.

    You can then convert cellarray output to array

    % Define the column index
    col_idx=2
    % Define the reference value
    ref_value=5000
    % Open input file
    fid=fopen('in.txt');
    % Read header
    tline = fgetl(fid);
    % Initialize conter
    cnt=0;
    % Initialize output variable
    data=[];
    % Read the file line by line
    while 1
       % Read the line
       tline = fgetl(fid);
       % Check for the end of file
       if ~ischar(tline)
          break
       end
       % Get the line field
       c=textscan(tline,'%c%f%f%f%f%f')
       % If the seconf field contains the ref value, then store the inout data
       if(c{col_idx} == ref_value)
          data=[data;c]
       end
    end
    fclose(fid);
    % Convert cell 2 array
    c=data(:,2:end)
    num_data=cell2mat(c)
    % Convert first column to char
    lab=char(data(:,1))
    

    Hope this helps.