I'm running into a slightly confusing issue.
Simplified query: (Assume ID
is primary key in each table)
SELECT
A.ID,
A.Data,
B.Data,
C.Data
FROM
A FULL OUTER JOIN
B ON A.ID = B.ID FULL OUTER JOIN
C ON A.ID = C.ID
I'm using FULL OUTER JOIN
's because in my situation there is no guarantee that any ID
is in all three tables, but if it is in more than one table, I want to see all the data for it in one row.
Here's the problem that I quickly realized: If there is an ID
that is in both tables B
and C
(but not A
), then you run into the following issues:
The ID
field is NULL
for ID
's that don't appear in table A
. This makes sense, since the query selects A.ID
. I found a pretty easy way around this, which is to use COALESCE
(i.e. COALESCE(A.ID,B.ID,C.ID)
).
The data for ID
's that aren't in table A
are returned in two separate rows. (One row has NULL
's for B
's data and the other has NULL
's for C
's data.) After thinking about it, this also makes sense because of the way the query above is written. Both tables B
and C
join based off of table A
, so if the ID
isn't in table A
, then the query has no relationship to join that ID
in tables B
and C
. I found a way around this as well, which is to explicitly specify the relationship to every table before it in the ON
clause, separated by OR
's.
So making the following changes will fix these two problems:
SELECT
COALESCE(A.ID,B.ID,C.ID),
A.Data,
B.Data,
C.Data
FROM
A FULL OUTER JOIN
B ON A.ID = B.ID FULL OUTER JOIN
C ON A.ID = C.ID OR B.ID = C.ID
This works fine, but it took me some time to figure this out, and future personnel who run across this query might it strange, because using COALESCE
and a bunch of OR
's in the ON
clause at first glance seems superfluous, but actually both are needed.
This can also get very confusing for larger queries, because the size of the ON
clause is compounded for each table that joins this way.
My question is: Is there some other built-in way or other trick to deal with OUTER JOIN
's of this type that already take into account these extra conditions that you don't need to consider for INNER JOIN
's?
This is another way. It's not necessarily more or less complicated or performant. You'd need to check for your case. As already stated, having to do this may indicate a modelling issue.
SELECT ID, MAX(Data1), MAX(Data2), MAX(Data3)
FROM
(
SELECT A.ID, A.Data1, NULL, NULL
FROM A
UNION ALL
SELECT B.ID, NULL, B.Data2, NULL
FROM B
UNION ALL
SELECT C.ID, NULL, NULL, C.Data3
FROM C
) T
GROUP BY ID