My requirement is to convert an ORACLE SQL query to databricks SQL unity catalog SQL query. Databricks SQL doesnt support scalar subqueries as a join condition, what I need to convert is :
inner join `uc-dev`.iccu_dna.RtxnStatHist rsh
ON r.AcctNbr = rsh.AcctNbr
AND r.RtxnNbr = rsh.RtxnNbr
AND rsh.TimeUniqueExtn = (SELECT MAX(TIMEUNIQUEEXTN)
FROM `uc-dev`.iccu_dna.RtxnStatHist rsh2
WHERE AcctNbr = rsh.AcctNbr
AND RtxnNbr = rsh.RtxnNbr)
I have tried to use CTE like so:
with
max_rsh
as
(SELECT MAX(TIMEUNIQUEEXTN) as max_TIMEUNIQUEEXTN
FROM `uc-dev`.iccu_dna.RtxnStatHist rsh2
WHERE AcctNbr IN (SELECT AcctNbr FROM `uc-dev`.iccu_dna.RtxnStatHist)
AND RtxnNbr IN (SELECT RtxnNbr FROM `uc-dev`.iccu_dna.RtxnStatHist)
)
and later joining this CTE :
inner join max_rsh mtxn
ON rsh.TimeUniqueExtn = mtxn.max_TIMEUNIQUEEXTN
Can you help me with this conversion? I need to be able to replicate this join condition in a databricks SQL notebook for unity catalog.
You can try the below approach to databricks SQL:
SELECT *
FROM `uc-dev`.iccu_dna.Rtxn r
INNER JOIN `uc-dev`.iccu_dna.RtxnStatHist rsh
ON r.AcctNbr = rsh.AcctNbr
AND r.RtxnNbr = rsh.RtxnNbr
AND rsh.TimeUniqueExtn = (
SELECT MAX(TIMEUNIQUEEXTN)
FROM `uc-dev`.iccu_dna.RtxnStatHist rsh2
WHERE AcctNbr = rsh.AcctNbr
AND RtxnNbr = rsh.RtxnNbr
)
I have tried the below approach as an example:
INPUT tables:
WITH maxTime AS (
SELECT
rsh.AcctNbr,
rsh.RtxnNbr,
rsh.TimeUniqueExtn,
ROW_NUMBER() OVER (PARTITION BY rsh.AcctNbr, rsh.RtxnNbr ORDER BY rsh.TimeUniqueExtn DESC) AS rn
FROM table1 r
INNER JOIN RtxnStatHist rsh
ON r.AcctNbr = rsh.AcctNbr
AND r.RtxnNbr = rsh.RtxnNbr
)
SELECT
r.AcctNbr,
r.RtxnNbr,
rsh.TimeUniqueExtn
FROM table1 r
INNER JOIN maxTime rsh
ON r.AcctNbr = rsh.AcctNbr
AND r.RtxnNbr = rsh.RtxnNbr
AND rsh.rn = 1;
Results:
AcctNbr RtxnNbr TimeUniqueExtn
1 101 2022-01-01T11:00:00Z
2 102 2022-01-01T12:00:00Z
3 103 2022-01-01T14:00:00Z
In the above code a Common table expression (CTE) named maxTime is created.
The CTE calculates the maximum TimeUniqueExtn for each (AcctNbr, RtxnNbr) combination using the ROW_NUMBER()
window function.
Then, the main query joins table1 with the maxTime CTE based on the account number, transaction number, and only selects the rows where the row number is 1 (which represents the maximum TimeUniqueExtn for each combination).