Search code examples
matlabfixed-widthtextscan

Matlab to read in fix-width text file


I have a text file like below:

TestData                                                                     

  6.84 11.31 17.51 22.62 26.91 31.98 36.47 35.85 28.47 20.57 10.50  6.37  test1
  0.24  2.62  4.94  7.17 10.39 15.37 18.73 18.29 12.26  6.46  1.15 -0.33  test2
 68.47 95.04156.07218.39304.31320.22311.69269.22203.01135.60 68.18 55.09  test3

 68.47 95.04156.07218.39304.31320.22311.69269.22203.01135.60 68.18 55.09  test4
...

As you can see, the first two lines are comments to ignore. In the following lines, there is a comment at the end of each line too. Each number is in the form of %6f. Also, there are blank lines in between.

I want to read in all the numbers into a matrix to make plots. I tried to use textscan, but had problems to ignore the last column, the blank lines and read in numbers that are connected (e.g., some numbers in the line: test4).

Here is the code I have by now:

data=dir('*.txt');
formatspecific='%6f%6f%6f%6f%6f%6f%6f%6f%6f%6f%6f%6f';
for i=1:length(data);
    TestData1=data(i).name;
    tempData=textscan(TestData1,formatspecific,'HeaderLines',2);
end

Anybody can help to make a sample code to improve the textscan part?


Solution

  • To use textscan to read a file, you have to "open" it before calling textscan and "close" it after; you should use

    • fopen to open the input file
    • fclose to close the input file

    textscan returns a cellarray with the content read from the input file; since you are reading more than one file, you should change the way you manage the cellarray returned by textscan, actually, as it is now in your code, the data are overwritten at each iteration.

    One possibility could be to store the data in an array of struct with, for example, 2 fields: the name of the input file and the data.

    Another possibility could be to generate a struct whos each fields contains the data read from the input file; you can automatically generate the name of the fileds.

    Another one possibility could be to store them into a a matrix.

    Hereafter, you can find a script in which these three alternative have been implemented.

    Code Updated (following the comment received)

    In order to be able to correctly read data such as 95.04156.07 as 95.04 156.07, the format specifier should be modified from %6f to %6.2f

    % Get the list of input data
    data=dir('input_file*.txt');
    % Define the number of data column
    n_data_col=12;
    % Define the number of heared lines
    n_header=2;
    % Build the format specifier string
    % OLD format specifier
    formatspecific=[repmat('%6f',1,n_data_col) '%s']
    % NEW format specifier
    formatspecific=[repmat('%6.2f',1,n_data_col) '%s']
    % Initialize the m_data matrix (if you know in advance the numer of row of
    % each input file yoiu can define since the beginning the size of the
    % matrix)
    m_data=[];
    % Loop for input file reading
    for i=1:length(data)
       % Get the i-th file name
       file_name=data(i).name
       % Open the i-th input file
       fp=fopen(file_name,'rt')
       % Read the i-th input file
       C=textscan(fp,formatspecific,'headerlines',n_header)
       % Close the input file
       fclose(fp)
       % Assign the read data to the "the_data" array struct
       the_data(i).f_name=file_name
       the_data(i).data=[C{1:end-1}]
       % Assign the data to a struct whos fileds are named after the inout file
       data_struct.(file_name(1:end-4))=[C{1:end-1}]
       % Assign the data to the matric "m_data
       m_data=[m_data;[C{1:end-1}]]
    end
    

    Input file

    TestData                                                                     
    
      6.84 11.31 17.51 22.62 26.91 31.98 36.47 35.85 28.47 20.57 10.50  6.37  test1
      0.24  2.62  4.94  7.17 10.39 15.37 18.73 18.29 12.26  6.46  1.15 -0.33  test2
     68.47 95.04156.07218.39304.31320.22311.69269.22203.01135.60 68.18 55.09  test3
    
     68.47 95.04156.07218.39304.31320.22311.69269.22203.01135.60 68.18 55.09  test4
    

    Output

    m_data =
    
      Columns 1 through 7
    
        6.8400   11.3100   17.5100   22.6200   26.9100   31.9800   36.4700
        0.2400    2.6200    4.9400    7.1700   10.3900   15.3700   18.7300
       68.4700   95.0400  156.0700  218.3900  304.3100  320.2200  311.6900
       68.4700   95.0400  156.0700  218.3900  304.3100  320.2200  311.6900
    
      Columns 8 through 12
    
       35.8500   28.4700   20.5700   10.5000    6.3700
       18.2900   12.2600    6.4600    1.1500   -0.3300
      269.2200  203.0100  135.6000   68.1800   55.0900
      269.2200  203.0100  135.6000   68.1800   55.0900
    

    Hope this helps.