Search code examples
sqlmatchteradatanearest-neighborapproximate

Teradata SQL join with approximate match by nearest Neighbor


I have two tables table1 and table2 I need to join these tables with keys that are not exactly matched, they could differ in the last 3 or 4 digits there is no fixed pattern, the joining will have two conditions:

  • 1st Condition: table2.Key_2 must be numerically greater than table1.Key_1 so join will will search for nearest large Key
  • 2nd Condition: table1 could have two keys approximately matched with one key in table2 so the join will consider the nearest large one the other return Null

|Key_1             |
|------------------|
|111330317223636588|
|121199074223629055|
|47256714523667238 |
|101348062023590858|
|106331320423644206|
|106061712623646625|
|120562195823631381|
|155570817823642550|
|114948476223640334|
|103285939423669298|
|103285939423669910|
|Key_2              |Value|
|-------------------+-----|
|111330317223636610 |Done |
|121199074223629090 |Done |
|47256714523667300  |Done |
|101348062023590900 |Done |
|106331320423644305 |Done |
|106061712623647100 |Done |
|120562195823631399 |Done |
|155570817823642677 |Done |
|114948476223640455 |Done |
|103285939423669988 |Done |

the expected result will be like the below:

|Key_1              |Value|
|-------------------+-----|
|111330317223636588 |Done |
|121199074223629055 |Done |
|47256714523667238  |Done |
|101348062023590858 |Done |
|106331320423644206 |Done |
|106061712623646625 |Done |
|120562195823631381 |Done |
|155570817823642550 |Done |
|114948476223640334 |Done |
|103285939423669298 |NULL | 
|103285939423669910 |Done |

Solution

  • This join on the next value is hard to express as a join, but simple using this approach:

    with cte as
     ( -- combine both key columns into one 
       select key_1, cast(null as varchar(10)) as val from table1
       union all
       select key_2 as key_1, "value" from table2
      -- order by 1
     )
    select 
       key_1
       -- return the next rows value
      ,lead(val)
       over (order by key_1) as "value"
      ,val
    from cte
    -- filter only rows from the first table
    qualify val is null
    

    The cte combines both tables:

     Key_1              val <-- NULLs indicate rows from table_1
     ------------------ ----- 
      47256714523667238 NULL 
      47256714523667300 Done  
     101348062023590858 NULL  
     101348062023590900 Done  
     103285939423669298 NULL
     103285939423669910 NULL  
     103285939423669988 Done  
     106061712623646625 NULL  
     106061712623647100 Done  
     106331320423644206 NULL  
     106331320423644305 Done  
     111330317223636588 NULL  
     111330317223636610 Done  
     114948476223640334 NULL  
     114948476223640455 Done  
     120562195823631381 NULL  
     120562195823631399 Done  
     121199074223629055 NULL  
     121199074223629090 Done  
     155570817823642550 NULL  
     155570817823642677 Done  
    

    LEAD finds the next row's value:

     Key_1              val  value 
     ------------------ ---- ----- 
      47256714523667238 NULL Done 
      47256714523667300 Done NULL 
     101348062023590858 NULL Done 
     101348062023590900 Done NULL 
     103285939423669298 NULL NULL 
     103285939423669910 NULL Done 
     103285939423669988 Done NULL 
     106061712623646625 NULL Done 
     106061712623647100 Done NULL 
     106331320423644206 NULL Done 
     106331320423644305 Done NULL 
     111330317223636588 NULL Done 
     111330317223636610 Done NULL 
     114948476223640334 NULL Done 
     114948476223640455 Done NULL 
     120562195823631381 NULL Done 
     120562195823631399 Done NULL 
     121199074223629055 NULL Done 
     121199074223629090 Done NULL 
     155570817823642550 NULL Done 
     155570817823642677 Done NULL 
    

    and the final QUALIFY removes all rows from the 2nd table

     Key_1              value 
     ------------------ ----- 
      47256714523667238 Done 
     101348062023590858 Done 
     103285939423669298 NULL 
     103285939423669910 Done 
     106061712623646625 Done 
     106331320423644206 Done 
     111330317223636588 Done 
     114948476223640334 Done 
     120562195823631381 Done 
     121199074223629055 Done 
     155570817823642550 Done