Search code examples
sqldatabasepostgresqlamazon-web-servicesamazon-redshift

How to match these 2 tables?


Here is the first table A in Redshift:

team     origin_id     targte_id
a        1             11
b        2             22
b        NULL          33
c        5             55
c        NULL          66

The origin_id and target_id could be NULL, but they couldn't both NULL in one row.

And the second table B:

team     origin_id     targte_id     content
a        1             11            aaa
a        1             11            bbb
b        NULL          22            xxx
c        5             NULL          zzz

I want to join these 2 tables and match the value of content to the table A. The match rule is: A.team_id=B.team_id; if the origin_id is not null, then A.origin_id=B.origin_id; if the origin_id is null, then A.target_id=B.target_id. And the row count of the final result should be the same as the table A.

The final result set I wanted is:

team     origin_id     targte_id     content
a        1             11            aaa
b        2             22            xxx
b        NULL          33            NULL
c        5             55            zzz
c        NULL          66            NULL

(For the first row team=a, it matches 2 row in table B; just choose any one in these matched rows, and it is OK.)

What's the correct SQL to achieve this in Redshift? I try left join but the result looks like a Cartesian Product of these 2 tables.


Solution

  • Left join is correct. However, in your example where

    For the first row team=a, it matches 2 row in table B; just choose any one in these matched rows, and it is OK.

    There is no way to "just choose any one", a JOIN operation will match both rows. It is up to you to de-dupe the result set afterwards.

    For example, you can group by A.team_id, A.origin_id, A.target_id:

    SELECT
      team_id,
      origin_id,
      target_id,
      MIN(content) any_content
    FROM (
      SELECT
        A.team_id,
        A.origin_id,
        A.target_id,
        B.content
      FROM
        A LEFT JOIN B
      ON
        A.team_id = B.team_id AND
        COALESCE(A.origin_id, 'is_null') = COALESCE(B.origin_id, 'is_null') AND
        COALESCE(A.target_id, 'is_null') = COALESCE(B.target_id, 'is_null')
    )
    GROUP BY
      team_id,
      origin_id,
      target_id