On Oracle 10gR2, given the following query, it takes forever to run. This is because of all the correlated subqueries in the select clause. There has got to be a better way. I'm thinking that rewriting the correlated subqueries as joins would do the trick, or building the query in such a way as to only need one correlated subquery, or possibly using some type of analytic function, but so far it is beyond me. Any help would be much appreciated.
Here is the query:
SELECT COL_1,
TAB_1.COL_2 AS REPORT,
(SELECT COL_3
FROM TAB_2
WHERE TAB_2.COL_1 = TAB_1.COL_1) AS DEPOT,
(SELECT COUNT(DISTINCT( TAB_3.COL_4 ))
FROM TAB_3
WHERE TAB_3.COL_9 = TAB_1.COL_1
AND TAB_1.COL_2 = TAB_3.COL_6
AND TAB_3.COL_5 IS NULL
AND TAB_3.COL_8 = 'Parts Shortage') AS P_SHORTAGES,
(SELECT COUNT(DISTINCT( Trim(COL_10) ))
FROM TAB_3
WHERE TAB_3.COL_9 = TAB_1.COL_1
AND TAB_1.COL_2 = TAB_3.COL_6
AND TAB_3.COL_5 IS NULL
AND TAB_3.COL_8 = 'Parts Shortage') AS PARTS_AFFECTED,
(SELECT COUNT(TAB_3.COL_7)
FROM TAB_3
WHERE TAB_3.COL_7 = 1
AND TAB_3.COL_9 = TAB_1.COL_1
AND TAB_1.COL_2 = TAB_3.COL_6
AND TAB_3.COL_5 IS NULL
AND TAB_3.COL_8 = 'Parts Shortage') AS PARTS_CATEGORY1,
(SELECT COUNT(TAB_3.COL_4)
FROM TAB_3
WHERE TAB_3.COL_9 = TAB_1.COL_1
AND TAB_3.COL_6 = TAB_1.COL_2
AND TAB_3.COL_5 IS NULL
AND TAB_3.COL_8 = 'Unsrv Asset') AS U_SHORTAGES,
(SELECT COUNT(TAB_3.COL_10)
FROM TAB_3
WHERE TAB_3.COL_9 = TAB_1.COL_1
AND TAB_3.COL_6 = TAB_1.COL_2
AND TAB_3.COL_5 IS NULL
AND TAB_3.COL_8 = 'Unsrv Asset') AS U_AFFECTED,
(SELECT COUNT(DISTINCT( Trim(TAB_3.COL_7) ))
FROM TAB_3
WHERE TAB_3.COL_7 = 1
AND TAB_3.COL_9 = TAB_1.COL_1
AND TAB_3.COL_6 = TAB_1.COL_2
AND TAB_3.COL_5 IS NULL
AND TAB_3.COL_8 = 'Unsrv Asset') AS UNSRV_CAT1,
To_char(TAB_1.COL_11, 'MM/DD/YY') AS REPORT_DATE
FROM TAB_1;
For me, to make things less complicated I tried to rewrite just the following, but I'm still at a loss:
SELECT COL_1,
(SELECT COUNT(DISTINCT( TAB_3.COL_4 ))
FROM TAB_3
WHERE TAB_3.COL_9 = TAB_1.COL_1
AND TAB_1.COL_2 = TAB_3.COL_6
AND TAB_3.COL_5 IS NULL
AND TAB_3.COL_8 = 'Parts Shortage') AS P_SHORTAGES
FROM TAB_1;
This is quite a loaded question.. There is not enough info here (e.g. execution plan, record counts, available indexes, pysical location of data in table, etc) to be able to tune the query properly. The best anyone can do is give a guess. That being said.. here is my best guess:
select distinct
tab_1.col_1,
tab_1.col_2 AS report,
tab_2.col_3 AS depot,
COUNT (DISTINCT (case when tab_3.col_8 = 'Parts Shortage' then tab_3.col_4 end)) over (partition by tab_1.col_1, tab_1.col_2) AS p_shortages,
COUNT (DISTINCT (case when tab_3.col_8 = 'Parts Shortage' then TRIM (tab_3.col_10) end)) over (partition by tab_1.col_1, tab_1.col_2) AS parts_affected,
COUNT (case when tab_3.col_8 = 'Parts Shortage' and tab_3.col_7 = 1 then tab_3.col_7 end) over (partition by tab_1.col_1, tab_1.col_2) AS parts_category1,
COUNT (case when tab_3.col_8 = 'Unsrv Asset' then tab_3.col_4 end) over (partition by tab_1.col_1, tab_1.col_2) AS u_shortages,
COUNT (case when tab_3.col_8 = 'Unsrv Asset' then tab_3.col_10 end) over (partition by tab_1.col_1, tab_1.col_2) AS u_affected,
COUNT (DISTINCT (case when tab_3.col_8 = 'Unsrv Asset' and tab_3.col_7 = 1 then TRIM(tab_3.col_7) end)) over (partition by tab_1.col_1, tab_1.col_2) AS unsrv_cat1,
TO_CHAR(tab_1.col_11, 'MM/DD/YY') AS report_date
from tab_1
left outer join tab_2
on tab_2.col_1 = tab_1.col_1
left outer join tab_3
on tab_3.col_9 = tab_1.col_1
AND tab_3.col_6 = tab_1.col_2
AND tab_3.col_5 IS NULL;
After looking closer at the query above to explain what I did, I modified it further. This was because I was grouping by TO_CHAR(tab_1.col_11, 'MM/DD/YY'), but saw this wasn't part of the correlated subquery criteria, so it had to change (unless the combination of tab_1.col_1 and col_2 are unique by themselves).
So now to try to explain this:
Basically what is happening in the original query is that for every row of tab_1, you are running several queries on tab_3. So instead of that, I changed it to an outer join on tab_3. Since I don't know the data, this had to be an outer join because a correlated subquery won't eliminate any rows from the final output, where an inner join might. I just joined to tab_3 once, since all of the subqueries were joining tab_3 back to tab_1 using the same fields. I just moved the specific logic for each subquery into a case statement inside the count so that the case would return null (and thus not get counted) if the criteria wasn't met. This newest version uses aggregate functions to get my calculations at the correct level (tab_1 col_1 and col_2 which is what the original subquery was joining based on). Since converting the subquery to an outer join could have possibly created more rows (if there isn't a 1-to-1 match between the tables) I added the distinct to only get one row for each row in tab_1. All rows for each row in tab_1 should be the same. If there were already duplicates in tab_1, you will have to do something a little more in-depth to get keep the number of records the same.
Hopefully this makes sense. If you have questions, feel free to ask and I will do my best to explain further.
--------------------------- More explanation
@shawno: Kind of, but I don't really think of it much as a loop.. Using your simplified example, let's pretend your tables look like this:
TAB_1:
col_1 col_2
-------- ---------
A B
C D
TAB_3:
col_9 col_6 col_4
-------- --------- ---------
A B X
A B Y
A B Z
C D X
C D X
Using the subquery method, you are looking at each row of tab_1 and then running a query against tab_3. So you would do:
for row col_1 = A, col_2 = B, run a select count(distinct(col_4)) on tab_3 where col_9 = A and col_6 = B. This returns the value 3, which is the value the subquery returns.
for row col_1 = C, col_2 = D, run a select count(distinct(col_4)) on tab_3 where col_9 = C and col_6 = D. This returns the value 1, which is the value the subquery returns.
Using the join method, you first join the tables giving you data like:
col_1 col_2 col_9 col_6 col_4
-------- --------- -------- --------- ---------
A B A B X
A B A B Y
A B A B Z
C D C D X
C D C D X
So now you just have to make a query based on that data, doing a Count(distinct(col_4)) for each value of col_1, col_2. If you know what your data looks like, you can create a more efficient query, but the idea remains the same.
Hope this makes it a little more clear!