Search code examples
arraysmatlabiotext-files

Creative way to create a Matlab array from a textfile with multiple headers


I am trying to parse a molecular dynamics dump file which has headers printed periodically. Between two successive headers, I have data (not guaranteed that the lenght of data is the same between any two successive headers) in a column format which I want to store and post-process. Is there a way I can do this without excessive use of for loops?

The basic gist of it is:

   ITEM: TIMESTEP
0
ITEM: NUMBER OF ENTRIES
1079
ITEM: BOX BOUNDS xy xz yz ff ff pp
-1e+06 1e+06 0
-1e+06 1e+06 0
-1e+06 1e+06 0
ITEM: ENTRIES index c_1[1] c_1[2] c_2[1] c_2[2] c_2[3] c_2[4] c_2[5] 
1 1 94 0.0399999 0 0.171554 -0.00124379 0 
2 1 106 0.0399999 0 -0.0638316 0.116503 0 
3 1 204 0.0299999 0 -0.124742 0.0290103 0 
4 1 675 0.0299999 0 0.0245382 -0.116731 0 
5 2 621 0.03 0 0.0328324 0.00185942 0 
6 2 656 0.04 0 -0.0315086 0.016237 0 
7 2 671 0.04 0 -0.00291159 -0.0169882 0 
8 3 76 0.03 0 0.01775 0.0100646 0 
9 3 655 0.03 0 0.00434063 -0.00750336 0 
.
.
.
.
.
1076 678 692 100000 0 -0.222481 -1.44632e-06 0 
1077 679 692 100000 0 -0.00232206 -8.05951e-09 0 
1078 682 691 100000 0 0.0753935 -2.89438e-07 0 
1079 687 692 100000 0 -0.0153246 -2.51076e-08 0 
ITEM: TIMESTEP
1000
ITEM: NUMBER OF ENTRIES
1078
ITEM: BOX BOUNDS xy xz yz ff ff pp
-1e+06 1e+06 0
-1e+06 1e+06 0
-1e+06 1e+06 0
ITEM: ENTRIES index c_1[1] c_1[2] c_2[1] c_2[2] c_2[3] c_2[4] c_2[5] 
1 1 94 0.0399997 0 1.3535 -0.00981109 0 
2 1 106 0.0399986 0 -6.36969 11.6275 0 
3 1 204 0.0299893 0 -236.114 54.9339 0 
4 1 675 0.0299998 0 0.148064 -0.704365 0 
.
.
.
.

TIA!


Solution

  • You don't need to write a single for loop to parse this file, MATLAB writes them for you:

    [headers, tables] = parseTables('tables.txt')
    
    ...
    
    function [headers, tables] = parseTables(filename)
    content = fileread(filename); % read whole file
    lines = splitlines(content); % split lines
    values = cellfun(@str2num, lines, 'UniformOutput', false); % convert lines to float, when possible
    headerLines = cellfun(@isempty, values); % lines with no floats
    headers = lines(headerLines); % extract headers
    startLines = find(headerLines)+1; % indices of first lines of tables
    endLines = [startLines(2:end)-1; length(values)]; % indices of last lines of tables
    tables = arrayfun(@(i, j) cell2mat(values(i:j)), ...
        startLines, endLines, 'UniformOutput', false); % merge table rows to single matrix
    end
    

    The results will be stored in cell arrays:

    headers =
    
      8×1 cell array
    
        {'ITEM: TIMESTEP'                                                       }
        {'ITEM: NUMBER OF ENTRIES'                                              }
        {'ITEM: BOX BOUNDS xy xz yz ff ff pp'                                   }
        {'ITEM: ENTRIES index c_1[1] c_1[2] c_2[1] c_2[2] c_2[3] c_2[4] c_2[5] '}
        {'ITEM: TIMESTEP'                                                       }
        {'ITEM: NUMBER OF ENTRIES'                                              }
        {'ITEM: BOX BOUNDS xy xz yz ff ff pp'                                   }
        {'ITEM: ENTRIES index c_1[1] c_1[2] c_2[1] c_2[2] c_2[3] c_2[4] c_2[5] '}
    
    
    tables =
    
      8×1 cell array
    
        {[        0]}
        {[     1079]}
        { 3×3 double}
        {13×8 double}
        {[     1000]}
        {[     1078]}
        { 3×3 double}
        { 4×8 double}