Search code examples
matlabtext-filescell-array

Reading a large, oddly formatted text file of data into MATLAB


Need to read a huge text file full of weirdly formatted data. The format goes like this:

//Header with Title Info

//Header with Test1 Info
//More Test1 Info
0,-156.875956035285
1.953125,-4.82866496038806
3.90625,-8.93502887648155
5.859375,-9.76964479822559
7.8125,-14.9767168331976
9.765625,-16.9949034672061
11.71875,-19.2709033739316
13.671875,-18.9948581866681

//Header with Test2 Info
//More Test2 Info
0,-156.875956035285
1.953125,-4.82866496038806
3.90625,-8.93502887648155
5.859375,-9.76964479822559
7.8125,-14.9767168331976
9.765625,-16.9949034672061
11.71875,-19.2709033739316
13.671875,-18.9948581866681

//Header with Test3 Info
//More Test3 Info
0,-156.875956035285
1.953125,-4.82866496038806
3.90625,-8.93502887648155
5.859375,-9.76964479822559
7.8125,-14.9767168331976
9.765625,-16.9949034672061
11.71875,-19.2709033739316
13.671875,-18.9948581866681

// End of Data

That's the gist of it, except there are about 25,000 entries for under each header instead of 8. I'm running 25 tests which need to be averaged together into one set of data.

Essentially, I want to parse through the data in this sequence:

  1. Skip first line
  2. Recognize empty line, go to next
  3. Check for "End of Data"
  4. If not the end, skip current line and the next line
  5. Create new array for current set of test data
  6. Read data until empty line is reached, then go back to step 2

Then, I want to average all of these sets together in the most efficient way.

I'm having trouble reading the data. I know I could use csvread, or a more general function to read the delimited values, but I'm kind of stuck with figuring out an elegant and concise way to do everything.

I started with this:

function [ data ] = graph( input_args )
%Plot data

myData = fopen('mRoom_fSweep_25points_center.txt');
data = textscan(myData,'%s');
fclose(myData);
length(data)
end

And I figured I could just find the length of this array of strings, and work out a for-loop for the whole list of operations, but I couldn't get past this point: the output kept giving me this:

ans = 
    {772321x1 cell}

Which I can't use. When I try and store this in a variable, it gives a value of 1. Is there something weird with cell arrays I'm missing here?


Solution

  • I assume you need the information in the "test info" lines?

    If so, you need to run textscan with two different patterns: one to pick out the info lines, and one to read the data:

     info(1, end+1) = textscan(fid, '//%s','Delimiter', '');
     data(1, end+1) = textscan(fid, '%f, %f', 'CollectOutput', true);
    

    Below is how i would wrap it with loop and error handling:

    % [info, data] = read_data(file_name): Read a file in funky format
    % 
    % info and data are cells of same size
    function [info, data] = read_data(file_name)
        [fid, msg] = fopen(file_name);
        if fid<0
            error('Unable to open file "%s": %s', file_name, msg);
        end
        % close the file no matter how we exit this funciton (error,
        % ctrl-c,...)
        finalize = onCleanup(@() fclose(fid));
    
        info = cell(1,0);
        data = cell(1,0);
        while true
            info(1, end+1) = textscan(fid, '//%s','Delimiter', '');
            data(1, end+1) = textscan(fid, '%f, %f', 'CollectOutput', true);
    
            if strcmpi(info{1,end}{end}, 'End of Data')
                % End of data reached, exit here
                info = info(1:end-1);
                data = data(1:end-1);
                break;
            end
            if isempty(data{1,end})
                % Empty data, but not 'End of data' marker.
                % Replace this error with break to accept files with missing
                % "end of data" tags
                error('Empty data before "End of Data" line')
            end
        end
    end
    

    Then, you can read the file and compute average as follows:

    >> [info, data] = read_data('foo.txt')
    info = 
        {3x1 cell}    {2x1 cell}    {2x1 cell}
    data = 
        [8x2 double]    [8x2 double]    [8x2 double]
    
    
    >> info{3}
    ans = 
        'Header with Test3 Info'
        'More Test3 Info'
    
    >> all_data = cellfun(@(d) d(:,2), data, 'UniformOutput', false); all_data = [all_data{:}]
    all_data =
     -156.8760 -156.8760 -156.8760
       -4.8287   -4.8287   -4.8287
       -8.9350   -8.9350   -8.9350
       -9.7696   -9.7696   -9.7696
      -14.9767  -14.9767  -14.9767
      -16.9949  -16.9949  -16.9949
      -19.2709  -19.2709  -19.2709
      -18.9949  -18.9949  -18.9949
    
    >> mean(all_data, 2)
    ans =
     -156.8760
       -4.8287
       -8.9350
       -9.7696
      -14.9767
      -16.9949
      -19.2709
      -18.9949