Search code examples
matlab

How to read a data file with different numbers of columns


I'm trying to read a data file (*.txt) through Matlab. One line of this data file would include 19 columns (each separated by a tab or space). However, the specific structure of the output file "wraps" each data line to include only 15 columns, and the next 4 lines go into a new line. After a certain amount of lines, the structure changes to 6 columns. I'm adding the following screenshot (mind you, not from the shared test file, but this is the same structure) for ease of explanation but the attached data file should explain things further.

enter image description here

As can be seen, lines 1~7556 have 19 columns (15 in one line and 4 in the next line, wrapped), and the lines 7557~ have 6 columns. These two structures repeat in the data file. Here is the pastebin link for the sample test file.

I tried the following code with no luck.

fid = fopen('test.txt');
C = cell2mat(textscan(fid, '%f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f %f'));
fclose(fid);

How can I read the data, and maybe get two separate readings (or data sets) that include the two different data structures?


Solution

  • Since you have consecutive delimiters, and delimiters at the start of some rows, its not super straight forward using the normal csv/text read functions of Matlab. But reading the entire file, and then deciding per number of missing values to which dataset a line/row belongs, works. See comments for explanation.

    % read csv, with consecutive delimiter join option
    data = readmatrix('test.txt','Delimiter',' ','ConsecutiveDelimitersRule', 'join');
    
    n_nans = sum(isnan(data),2); % identify number of nans per row
    
    % get the row indices and identify for which dataset the rows are 
    rows_set1_1 = n_nans == 1;  % one nan per full row for first dataset
    rows_set1_2 = n_nans == 12; % 12 nans per row for the 'wrapped' lines
    rows_set2 = n_nans == 10;
    
    % divide data in two datasets
    dataset1 = [data(rows_set1_1, :), data(rows_set1_2, :)]; % horizontal concatenate set 1 & 2
    dataset2 = data(rows_set2, :);
    
    % remove nan columns
    dataset1(:,all(isnan(dataset1),1)) = []; 
    dataset2(:,all(isnan(dataset2),1)) = [];