Search code examples
sql-serversql-server-2008outer-join

Odd FULL OUTER JOIN "ON" Clause?


I'm running into a slightly confusing issue.

Simplified query: (Assume ID is primary key in each table)

SELECT
A.ID,
A.Data,
B.Data,
C.Data
FROM
A FULL OUTER JOIN
B ON A.ID = B.ID FULL OUTER JOIN
C ON A.ID = C.ID

I'm using FULL OUTER JOIN's because in my situation there is no guarantee that any ID is in all three tables, but if it is in more than one table, I want to see all the data for it in one row.

Here's the problem that I quickly realized: If there is an ID that is in both tables B and C (but not A), then you run into the following issues:

  1. The ID field is NULL for ID's that don't appear in table A. This makes sense, since the query selects A.ID. I found a pretty easy way around this, which is to use COALESCE (i.e. COALESCE(A.ID,B.ID,C.ID)).

  2. The data for ID's that aren't in table A are returned in two separate rows. (One row has NULL's for B's data and the other has NULL's for C's data.) After thinking about it, this also makes sense because of the way the query above is written. Both tables B and C join based off of table A, so if the ID isn't in table A, then the query has no relationship to join that ID in tables B and C. I found a way around this as well, which is to explicitly specify the relationship to every table before it in the ON clause, separated by OR's.

So making the following changes will fix these two problems:

SELECT
COALESCE(A.ID,B.ID,C.ID),
A.Data,
B.Data,
C.Data
FROM
A FULL OUTER JOIN
B ON A.ID = B.ID FULL OUTER JOIN
C ON A.ID = C.ID OR B.ID = C.ID

This works fine, but it took me some time to figure this out, and future personnel who run across this query might it strange, because using COALESCE and a bunch of OR's in the ON clause at first glance seems superfluous, but actually both are needed.

This can also get very confusing for larger queries, because the size of the ON clause is compounded for each table that joins this way.

My question is: Is there some other built-in way or other trick to deal with OUTER JOIN's of this type that already take into account these extra conditions that you don't need to consider for INNER JOIN's?


Solution

  • This is another way. It's not necessarily more or less complicated or performant. You'd need to check for your case. As already stated, having to do this may indicate a modelling issue.

    SELECT ID, MAX(Data1), MAX(Data2), MAX(Data3)
    FROM
    (
    SELECT A.ID, A.Data1, NULL, NULL
    FROM A
    UNION ALL
    SELECT B.ID, NULL, B.Data2, NULL
    FROM B
    UNION ALL
    SELECT C.ID, NULL, NULL, C.Data3
    FROM C
    ) T
    GROUP BY ID