I have a T-SQL Query that includes correlated columns. When I attempt to execute the query with Databricks SQL I get the error:
Error in SQL statement: AnalysisException: Correlated column is not allowed in a non-equality predicate:
Aggregate [max(DateOfChange#20975) AS max(DateOfChange)#20988]
+- Filter (((SiiAuditTypeID#20976 = 3) AND (SplitID#20973 = outer(SplitID#20825))) AND outer(StatusID#20829) IN (1,2))
+- SubqueryAlias sia
+- SubqueryAlias spark_catalog.dbo.SiiAudit
+- Relation[SiiAuditID#20972,SplitID#20973,AccountID#20974,DateOfChange#20975,SiiAuditTypeID#20976,UserID#20977,UserName#20978,PortfolioID#20979,PortfolioSplit#20980,SiiAuditEntryKindID#20981,ModelPortfolioID#20982,primary_key_hash#20983,change_key_hash#20984,reject_reason#20985,reject_row#20986] parquet
;
Distinct
+- Project [AccountID#20826, CreatedDate#20827, StatusID#20829, scalar-subquery#20691 [SplitID#20825 && StatusID#20829] AS CancelledDate#20692]
: +- Aggregate [max(DateOfChange#20975) AS max(DateOfChange)#20988]
: +- Filter (((SiiAuditTypeID#20976 = 3) AND (SplitID#20973 = outer(SplitID#20825))) AND outer(StatusID#20829) IN (1,2))
: +- SubqueryAlias sia
: +- SubqueryAlias spark_catalog.dbo.SiiAudit
Can someone help refactor the T-SQL such that it executes without correlated/subqueries.
The query is as follows:
SELECT DISTINCT
s.AccountID,
s.CreatedDate,
s.StatusID,
(
SELECT MAX(DateOfChange) AS CancelledDate
FROM CRM.InvestmentInstruction.SiiAudit sia
WHERE sia.SiiAuditTypeID = 3 -- cancelled
AND sia.SplitID = s.SplitID
AND s.StatusID IN ( 1, 2 ) -- cancelled, completed
) AS CancelledDate
FROM CRM.InvestmentInstruction.Split s
INNER JOIN CRM.InvestmentInstruction.SplitPortfolio sp
ON sp.SplitID = s.SplitID
INNER JOIN CRM.InvestmentInstruction.InvestmentRequest ir
ON ir.InvestmentRequestID = sp.InvestmentRequestID
INNER JOIN CRM.dbo.ModelPortfolio mp
ON mp.ModelPortfolioID = ir.ModelID
INNER JOIN
(
SELECT DISTINCT
mh.ModelPortfolioID
FROM CRM.dbo.modelHolding mh
INNER JOIN Securities.dbo.Security sec
ON sec.SecurityID = mh.LinkSecurityId
WHERE sec.IsCashSecurity = 0
) mh
ON mh.ModelPortfolioID = mp.ModelPortfolioID
WHERE s.TypeID = 0
Can you move the condition outside the subquery this way?:
case when s.StatusID in (1, 2) then (
SELECT MAX(DateOfChange)
FROM CRM.InvestmentInstruction.SiiAudit sia
WHERE sia.SiiAuditTypeID = 3 -- cancelled
AND sia.SplitID = s.SplitID) end as CancelledDate