Background
Recently my company upgraded from Oracle 11g to Exadata. All production objects and data were successfully migrated to the new system and were verified to be a 1:1 match between systems. Immediately after the first daily set of ETL processes were completed on our new system we discovered our reporting tables were significantly smaller than expected. Upon further investigation we discovered that a batch ID being added to LEFT OUTER joins were causing the problem where they had worked perfectly on 11g.
Question
Why would the following query be treated as a LEFT OUTER JOIN on 11g, but an INNER JOIN on Exadata?
SELECT
*
FROM DIM_CALL CALLS
LEFT OUTER JOIN FACT_ACTVY_MNGMT ACTVY_MNGMT
ON ACTVY_MNGMT.CALL_KEY = CALLS.CALL_KEY
AND ACTVY_MNGMT.BATCH_ID = 20141112
LEFT OUTER JOIN DIM_ACTVY ACTVY
ON ACTVY.ACTVY_KEY = ACTVY_MNGMT.ACTVY_KEY
AND ACTVY_MNGMT.BATCH_ID = 20141112
Update
It appears there was either a typo in the query used in the ETL process or the original developer just overlooked this case. If you look at the last join you'll notice that the join is on the ACTVY_KEY and BATCH_ID. The issue is that the BATCH_ID it's referencing is from the ACTVY_MNGMT table. The database essentially treats this as a WHERE clause, so any case in which CALL_KEY is NULL results in a failure.
There is a minor programming mistake here, but it also looks like there is an optimizer bug. Think of a LEFT JOIN
as operating on the previous set, not just the previous table. Adding a duplicate predicate to a different LEFT JOIN
should not make any difference.
My guess is there is an optimizer or parser bug with some fancy Exadata feature, such as smart scan.
This code is similar to your problem. However, it does not reproduce the error for me, on 11.2.0.3 EE, on Solaris. And it may not even reproduce the error for you, on Exadata. It's just meant to show that the problem you've described is a bug. Avoiding the bug, like you're doing, is usually the best approach. But you may also want to create a service request with Oracle support to look into the issue. That same bug may be affecting other code in less obvious ways.
with table1 as (select '1' a, '1' b from dual),
table2 as (select '1' a, '2' b from dual),
table3 as (select '1' a, '2' b from dual)
select *
from table1
left join table2
on table1.a = table2.a
and table2.b = 3
left join table3
on table2.a = table3.a;
with table1 as (select '1' a, '1' b from dual),
table2 as (select '1' a, '2' b from dual),
table3 as (select '1' a, '2' b from dual)
select *
from table1
left join table2
on table1.a = table2.a
and table2.b = 3
left join table3
on table2.a = table3.a
--This predicate is logically redundant but does *not* change results.
and table2.b = 3;
Both queries return this result on Oracle 11.2.0.3, EE, on Solaris:
A B A B A B
- - - - - -
1 1