I have the following join a table to the most recent record for a given EMPLOYE_ID
and I am wondering if there is a more efficient/faster way of retrieving the most recent record, what would be the best way?
SELECT * FROM EMPLOYEE
WHERE NOT EXISTS (
SELECT 1
FROM EMPLOYEE D
JOIN EMPLOYEE_HISTORY E
ON E.EMPLOYEE_ID = D.EMPLOYEE_ID
AND E.CREATE_DATE IN (SELECT MAX(CREATE_DATE)
FROM EMPLOYEE_HISTORY
WHERE EMPLOYEE_ID = D.EMPLOYEE_ID)
)
When I compared the explain plan to the following query it seems the below way is MORE costly.
SELECT *
FROM EMPLOYEE
WHERE NOT EXISTS
(SELECT 1
FROM EMPLOYEE D
JOIN (
SELECT E.*
FROM EMPLOYEE_HISTORY E
INNER JOIN (
SELECT EMPLOYEE_ID
, MAX(CREATE_DATE) max_date
FROM EMPLOYEE_HISTORY E2
GROUP BY EMPLOYEE_ID
) EE
ON EE.EMPLOYEE_ID = E.EMPLOYEE_ID
AND EE.max_date = E.CREATE_DATE
) A
ON A.EMPLOYEE_ID = D.EMPLOYEE_ID
AND ROWNUM = 1)
So does that mean it is indeed better?
There is no index on CREATE_DATE, however the PK is on EMPLOYEE_ID, CREATE_DATE
Use the RANK
(or DENSE_RANK
or ROW_NUMBER
) analytic function:
SELECT 1
FROM EMPLOYEE E
JOIN (
SELECT *
FROM (
SELECT H.*,
RANK() OVER ( PARTITION BY EMPLOYEE_ID ORDER BY CREATE_DATE DESC ) AS rnk
FROM EMPLOYEE_HISTORY H
)
WHERE rnk = 1
) H
ON H.EMPLOYEE_ID = E.EMPLOYEE_ID