Search code examples
matlabouter-jointimetable

Timetable outerjoin without sorting on keys


I am trying to find a way to perform an outerjoin of two timetables containing the same variables, while not sorting the merged rows of keys based on value. I want to give preference to values from table1 (high quality data) over table2 (low quality data) where the same time sample is present in both tables.

The closest I can find is Merge tables Without sorting on Keys, but this does not work for my situation as I must order the timetable by time in order to use retime, which I need to condense the duplicated time samples.

e.g.

table1 = array2timetable([2;5], 'RowTimes', datetime(2000:2001,1,1), 'VariableNames', {'A'})
table2 = array2timetable([1;3], 'RowTimes', datetime(2001:2002,1,1), 'VariableNames', {'A'})

mergedTable = outerjoin(table1, table2, 'MergeKeys', true, 'Keys', {'Time', 'A'})

table1 =

  2×1 timetable

       Time        A
    ___________    _

    01-Jan-2000    2
    01-Jan-2001    5


table2 =

  2×1 timetable

       Time        A
    ___________    _

    01-Jan-2001    1
    01-Jan-2002    3


mergedTable =

  4×1 timetable

       Time        A
    ___________    _

    01-Jan-2000    2
    01-Jan-2001    1
    01-Jan-2001    5
    01-Jan-2002    3

My desired output is:

       Time        A
    ___________    _

    01-Jan-2000    2 <- only in table1
    01-Jan-2001    5 <- table1 row first, regardless of value
    01-Jan-2001    1 <- table2 row second, regardless of value
    01-Jan-2002    3 <- only in table2

This would allow me to use retime to get a set of values for all available time samples, preferring table1 where they overlap:

retime(mergedTable, unique(mergedTable.Time), 'firstvalues')

       Time        A
    ___________    _

    01-Jan-2000    2
    01-Jan-2001    5
    01-Jan-2002    3

Maybe there's a better approach than outerjoin and retime?


Solution

  • You could add a column to both source tables with the table data quality "ranking". Include this column in your join. Then after the join you can re-sort by the quality ranking and then the time, and you'd end up with your desired ordering.

    table1 = array2timetable([2;5], 'RowTimes', datetime(2000:2001,1,1), 'VariableNames', {'A'})
    table2 = array2timetable([1;3], 'RowTimes', datetime(2001:2002,1,1), 'VariableNames', {'A'})
    
    % Assign quality score to the input tables
    table1.Quality(:) = 1;
    table2.Quality(:) = 2;
    
    mergedTable = outerjoin(table1, table2, 'MergeKeys', true, 'Keys', {'Time', 'A', 'Quality'})
    
    % Re-sort the table to prioritise higher quality data
    mergedTable = sortrows( mergedTable, {'Quality', 'Time'} )
    

    Output:

    mergedTable =
      4×2 timetable
            Time       A    Quality
        ___________    _    _______
        01-Jan-2000    2    1      
        01-Jan-2001    5    1      
        01-Jan-2001    1    2      
        01-Jan-2002    3    2   
    

    If you want to remove the helper column afterwards, that is easy

    mergedTable.Quality = [];