Search code examples
matlabjoininner-joinnan

MATLAB inner join two tables treating NaN entries as identical


I would like to innerjoin multiple tables where key columns contain many NaN values, and merging the tables remove these entries since NaNs are not considered identical.

I would like to innerjoin tables treating key columns' NaN entries identical. Below example demonstrates the problem I am having:

>> A = table({'a' 'b' 'd' 'e'}',[123, 456, 789, NaN]',[4 5 6 7]', 'VariableNames', {'Key1', 'Key2', 'Var1'})
A =
  4×3 table
    Key1    Key2    Var1
    ____    ____    ____
    'a'     123      4  
    'b'     456      5  
    'd'     789      6  
    'e'     NaN      7  

>> B = table({'a' 'b' 'd' 'e'}', [123, 456, 789, NaN]', [1 2 3 4]', 'VariableNames', {'Key1', 'Key2', 'Var2'})
B =
  4×3 table
    Key1    Key2    Var2
    ____    ____    ____
    'a'     123      1  
    'b'     456      2  
    'd'     789      3  
    'e'     NaN      4  

>> innerjoin(A,B)
ans =
  3×4 table
    Key1    Key2    Var1    Var2
    ____    ____    ____    ____
    'a'     123      4       1  
    'b'     456      5       2  
    'd'     789      6       3  

Desired output is below where Key columns NaN values are treated identical hence performing inner join on the entry where Key1 is 'e' and Key2 is NaN.

>> innerjoin(A,B)
ans =
  3×4 table
    Key1    Key2    Var1    Var2
    ____    ____    ____    ____
    'a'     123      4       1  
    'b'     456      5       2  
    'd'     789      6       3  
    'e'     NaN      7       4

Any help will be greatly appreciated!


Solution

  • You can replace NaN by inf, apply innerjoin, and then replace back:

    A.Key2(isnan(A.Key2)) = inf;
    B.Key2(isnan(B.Key2)) = inf;
    result = innerjoin(A,B);
    result.Key2(isinf(result.Key2)) = NaN;