Search code examples
excelmatlabdata-conversion

Convert txt file to excel file with headers in Matlab


I'm struggling on splitting a txt file made of 1 column of N blocks made of 1000 rows. The first 3 rows of each block is an header, followed by 1000 rows of data. Row 1001 will be the next block starting with the three headers in this way:

#Header1
#Header2_1,Header2_2
#Header3
Data1_1-Data1_2
...
Data1000_1-Data1001_2
Header1
Header2
Header3
Data1002_1-Data1002_2
...
Data2003_1-Data2003_2
:::

Each row of data is made of two sets, so I'd like to split into two columns the data, keeping each header in this way

         Block1           |           Block2             |       Block3 ... 
Header1                   | Header1
Header2 1_1 | Header2 1_2 | Header2 2_1 | Header2 2_2
Header3                   | Header3
Data1_1 | Data1_2         | Data1002_1 | Data1002_2
...                       | ...
Data1001_1 | Data1001_2   | Data2003_1 | Data2003_2

I tried with reshape but without success:

filename = 'text.txt';
data = splitlines(filename);
NRows = 1001;
NumBlocks = length(data)/NRows
Data_Reshaped = reshape(data,NRows,NumBlocks);

but it works partially, because I obtain a 1001 x 12 cell array and each cell has

Data1_1-Data1_2

with the - as delimiter. How can I split again this cell into two columns in ordered way and get a 1001x24 cell arrays? Than the result should be exported in excel but I guess writetable function should do the job.


Solution

  • You need to do three things to make your data flat/uniform enough to put into Excel

    1. Split the single-header rows into two headers, the 2nd of which is just an empty string but you need two columns so you can stack with others properly

    2. Split the two-header row into two headers, using , as the delimiter

    3. Split the data rows into two, using - as the delimiter

    I've created an example (at the bottom of this answer) with 3 data rows for testing:

    First, read in the data

    % Use fileread to get the entire file contents, split on newlines
    dat = fileread( 'text.txt' );
    dat = splitlines( dat );
    
    % Define the schema for this data: 3 header rows then 3 data rows
    nHeaders = 3;
    nDataRows = 3;
    nPerBlock = nHeaders + nDataRows;
    
    % Do an initial reshape to get each "block" into its own column of the cell
    dat = reshape( dat, nPerBlock, [] );
    

    Now we can do the numbered operations above to get a cell array with two-columns per cell:

    % Split single-item headers from 'Header1' into {'Header1',''}
    dat([1,3],:) = cellfun( @(x) {x,''}, dat([1,3],:), 'uni', 0 );
    
    % Split two-item headers from 'Header2 2_1,Header2 2_2' into {'Header2 2_1','Header2 2_2'}
    dat(2,:) = cellfun( @(x) strsplit( x, ',' ), dat(2,:), 'uni', 0 );
    
    % Split the data from 'Data1_1-Data1_2' into {'Data1_1','Data1_2'}
    dat(nHeaders+1:end,:) = cellfun( @(x) strsplit( x, '-' ), dat(nHeaders+1:end,:), 'uni', 0 );
    

    Now you need to do a final bit of restructuring to expand and flatten the cell array:

    dat = arrayfun( @(r) [dat{r,:}], 1:size(dat,1), 'uni', 0 );
    dat = vertcat( dat{:} );
    

    Final output:

    dat output

    Since your data is a cell array rather than a table at this point (despite looking a bit like a table), you probably want to use writecell() to write into Excel or a csv.


    Example data text.txt:

    #Header1
    #Header2 1_1,Header2 1_2
    #Header3
    Data1_1-Data1_2
    Data2_1-Data2_2
    Data3_1-Data3_2
    #Header1
    #Header2 2_1,Header2 2_2
    #Header3
    Data4_1-Data4_2
    Data5_1-Data5_2
    Data6_1-Data6_2
    #Header1
    #Header2 3_1,Header2 3_2
    #Header3
    Data7_1-Data7_2
    Data8_1-Data8_2
    Data9_1-Data9_2