Search code examples
sqlpostgresqljoinsql-order-bygreatest-n-per-group

Join on multiple columns and in one of the integer columns join by choosing minimum difference


I got table t1 and i want to join it with table t2 below on columns a, b and c

+---------+---------+---------+
|a        |b        |c        |
+---------+---------+---------+
|473200   |1        |1.-1-1   |
|472400   |10       |1.-1-1   |
|472800   |10       |1.-1-1   |
|473200   |93       |1.-1-1   |
|472800   |26240    |1.-1-1   |
+---------+---------+---------+

t2

+---------+---------+---------+
|a        |b        |c        |
+---------+---------+---------+
|473200   |1        |1.-1-1   |
|472400   |10       |1.-1-1   |
|472800   |10       |1.-1-1   |
|473200   |93       |1.-1-1   |
|472800   |26250    |1.-1-1   |
+---------+---------+---------+

When I join only on a and c the result is

+---------+---------+---------+---------+
|t1.b     |t2.b     |a        |c        |
+---------+---------+---------+---------+
|93       |1        |473200   |1.-1-1   |
|1        |1        |473200   |1.-1-1   |
|10       |10       |472400   |1.-1-1   |
|10       |10       |472800   |1.-1-1   |
|26240    |10       |472800   |1.-1-1   |
|93       |93       |473200   |1.-1-1   |
|1        |93       |473200   |1.-1-1   |
|10       |26250    |472800   |1.-1-1   |
|26240    |26250    |472800   |1.-1-1   |
+---------+---------+---------+---------+

What I try to achieve is to add column b to 'on' clause so that join happens on minimum difference in b column.

Desired result

+---------+---------+---------+---------+
|t1.b     |t2.b     |a        |c        |
+---------+---------+---------+---------+
|1        |1        |473200   |1.-1-1   |
|10       |10       |472400   |1.-1-1   |
|10       |10       |472800   |1.-1-1   |
|93       |93       |473200   |1.-1-1   |
|26240    |26250    |472800   |1.-1-1   |
+---------+---------+---------+---------+

I saw something similar here

https://dba.stackexchange.com/questions/73804/how-to-retrieve-closest-value-based-on-look-up-table

but not sure how to apply to my case.


Solution

  • Join the tables and calculate the differences for column c, then use distinct on to return only one row per (a, c) ordered by the difference.

    with joined as (
      select t1.a, t1.c, t1.b as b1, t2.b as b2, t2.b - t1.b as b_diff
        from t1
             join t2 
               on t2.a = t1.a
              and t2.b = t1.b
              and t1.b <= t2.b
    )
    select distinct on (a, c) b1, b2, a, c
      from joined
     order by a, c, b_diff
    ;