Search code examples
matlabdatetimematlab-table

Unify timestamps as date strings


MATLAB R2015b

I have a table containing a date string and a time string in various formats in two columns for each row:

11.01.2016 | 00:00:00 | data

10/19/16 | 05:29:00 | data

12.02.16 | 06:40 | data

I want to convert this two columns to one column with a common format:

31.12.2017 14:00:00

My current solution uses a loop over each row and combines the columns as strings, checks for the various formats to use datetime with an appropriate format string and then uses datestr with the desired format string. Datetime was not able to automatically determine the format of the input string.

As you can imagine, this is horribly slow for large tables (approx. 50000 rows).

Is there any faster solution?

Thanks in advance.


Solution

  • I gave a try to vectorize the code. The trick is to

    1. convert tables > cell > char-array, then
    2. manipulate char strings, then
    3. convert back from char-array > cell > table

    Also, there is an important bit to pad all cells having shorter lenths with 'null' character in a vectorized way. Without this, it will not be possible to convert from cell > char-array. Here is the code. clc clear all

    %% create Table T
    d={'11.01.2016';
       '10/19/16';
       '12.02.16'};
    
    t={'00:00:00';
      '05:29:00';
      '06:40'};
    dat=[123;
        456;
        789];
    
    T = table(d,t,dat);
    
    %% deal with dates in Table T
    % separate date column and convert to cell
    dd = table2cell(T(:,1));
    % equalize the lengths of all elements of cell
    % by padding 'null' in end of shorter dates
    nmax=max(cellfun(@numel,dd));
    func = @(x) [x,zeros(1,nmax-numel(x))];
    temp1 = cellfun(func,dd,'UniformOutput',false);
    % convert to array for vectorized manipulation of char strings
    ddd=cell2mat(temp1);
    % replace the separators in 3rd and 6th location with '.' (period)
    ddd(:,[3 6]) = repmat(['.' '.'], length(dd),1);
    % find indexes of shorter dates 
    short_year_idx = find(uint16(ddd(:,nmax)) == 0);
    % find the year value for those short_year cases
    yy = ddd(short_year_idx,[7 8]);
    % replace null chars with '20XX' string in desirted place
    ddd(short_year_idx,7:nmax) = ...
        [repmat('20',size(short_year_idx,1),1) yy];
    % convert char array back to cell and replace in table
    dddd = mat2cell(ddd,ones(1,size(d,1)),nmax);
    T(:,1) = table(dddd);
    
    %% deal with times in Table T
    % separate time column and convert to cell
    tt = table2cell(T(:,2));
    % equalize the lengths of all elements of cell
    % by padding 'null' in end of shorter times
    nmax=max(cellfun(@numel,tt));
    func = @(x) [x,zeros(1,nmax-numel(x))];
    temp1 = cellfun(func,tt,'UniformOutput',false);
    % convert to array for vectorized manipulation of char strings
    ttt=cell2mat(temp1);
    % find indexes of shorter times (assuming only ':00' in end is missing
    short_time_idx = find(uint16(ttt(:,nmax)) == 0);% dirty hack, as null=0 in ascii
    % replace null chars with ':00' string
    ttt(short_time_idx,[6 7 8]) = repmat(':00',size(short_time_idx,1),1);
    % convert char array back to cell and replace in table
    tttt = mat2cell(ttt,ones(1,size(t,1)),nmax);
    T(:,2) = table(tttt);