I made a query which works, but is a little slow in my opinion. When I suppress the output to 10 rows, it takes 13 minutes to perform the query. This is the query, stripped from some stuff:
SELECT
(SELECT ANSWER
FROM (
SELECT to_number(fiit.ANSWER, '999') ANSWER,
foin.CLIENT_ID id,
foin.STARTDATE start_date,
row_number() over(PARTITION BY foin.CLIENT_ID ORDER BY foin.FORM_ID ASC) rnk
FROM forms_filled foin, forms_items_filled fiit, treatment trtm
WHERE foin.FORM_ID = fiit.FORM_ID
AND foin.CLIENT_ID = trtm.CLIENT_ID
AND fiit.FORM_NUMBER = 607
AND fiit.FORM_ITEM_NUMBER = 3779
AND length(fiit.ANSWER) >= 1
AND trtm.TREATMENTCODE = 'K'
AND trtm.ENDDATE BETWEEN TRUNC(to_date('01/01/2014', 'dd/mm/yyyy'), 'DDD') AND TRUNC(to_date('31/12/2014', 'dd/mm/yyyy'), 'DDD')
AND foin.STARTDATE BETWEEN trtm.STARTDATUM AND NVL(trtm.ENDDATE, to_date('01/01/9999', 'dd/mm/yyyy'))
) inn
WHERE rnk = 1
AND inn.id = client.CLIENT_ID
) form1
FROM treatment trtm, CLIENT client
WHERE trtm.TREATMENTCODE = 'K'
AND client.CLIENT_ID = trtm.CLIENT_ID
AND trtm.ENDDATE BETWEEN TRUNC(to_date('01/01/2014', 'dd/mm/yyyy'), 'DDD') AND TRUNC(to_date('31/12/2014', 'dd/mm/yyyy'), 'DDD')
The outer query results in 175 clients who have a particular treatment code and have a treatment enddate in 2014. Now for each of these clients a lot of other data is retrieved (like name, age, treatment hours) which is not relevant and I left out now. Then there are about 30 similar subqueries, which retrieve answers from forms. I used a correlated query, because to retrieve the answers from these forms, the client id must be known. If that was the only thing the subquery needed to find the data, it wouldn't be a problem, but there is an other requirement: the retrieved forms must be filled in within the treatment period, because I couldn't find a way to push this data from the outer query to the sub-sub-query, I query this again in the subsubquery, which is causing the slow speed.
The reason to have a subquery and a subsubquery is because the N-th ranking answer from a form must be found. In a previous version of my code I didn't have the treatmentcode, treatment start and enddate requirements in the where clause of the subsubquery. This caused the subsubquery to come up with e.g. 4 results which were ranked 1,2,3,4 but not necessarily of forms created within the treatment period, which is wrong.
So adding these lines:
AND trtm.TREATMENTCODE = 'K'
AND trtm.ENDDATE BETWEEN TRUNC(to_date('01/01/2014', 'dd/mm/yyyy'), 'DDD')
AND TRUNC(to_date('31/12/2014', 'dd/mm/yyyy'), 'DDD')
AND foin.STARTDATE BETWEEN trtm.STARTDATUM AND NVL(trtm.ENDDATE, to_date('01/01/9999', 'dd/mm/yyyy'))
caused the query to be correct, where it was not completely correct before. They also caused the query to take several hours, instead of ~40 seconds for 175 rows.
My question now is, how can I rewrite this query to make it faster? I use Oracle 11.2.40 in combination with Toad Data Point 3.5, but I can't see the explain plan unfortunately.
If you use the keep
keyword to get the first value, you can dispense with the nested subqueries. This, in turn, allows you to use a query correlated with the outer query, so you don't have to re-calculate the results for all rows to get the value for a given row.
The query would look like:
SELECT (SELECT max(to_number(fiit.ANSWER, '999')) keep (dense_rank first order by foin.FORM_ID ASC)
FROM forms_filled foin JOIN
forms_items_filled fiit
ON foin.FORM_ID = fiit.FORM_ID
WHERE foin.CLIENT_ID = trtm.CLIENT_ID AND
fiit.FORM_NUMBER = 607
fiit.FORM_ITEM_NUMBER = 3779 AND
length(fiit.ANSWER) >= 1 AND
foin.STARTDATE BETWEEN trtm.STARTDATUM AND NVL(trtm.ENDDATE, DATE '1999-01-01')
)
I also encourage you to use modern explicit join
syntax and the date
keyword for expressing date constants.