Search code examples
sqlsql-servert-sqldatabricks-sql

Databricks SQL Will Not Accept Correlated Columns Error: Correlated column is not allowed in a non-equality predicate


I have a T-SQL Query that includes correlated columns. When I attempt to execute the query with Databricks SQL I get the error:

Error in SQL statement: AnalysisException: Correlated column is not allowed in a non-equality predicate:
Aggregate [max(DateOfChange#20975) AS max(DateOfChange)#20988]
+- Filter (((SiiAuditTypeID#20976 = 3) AND (SplitID#20973 = outer(SplitID#20825))) AND outer(StatusID#20829) IN (1,2))
   +- SubqueryAlias sia
      +- SubqueryAlias spark_catalog.dbo.SiiAudit
         +- Relation[SiiAuditID#20972,SplitID#20973,AccountID#20974,DateOfChange#20975,SiiAuditTypeID#20976,UserID#20977,UserName#20978,PortfolioID#20979,PortfolioSplit#20980,SiiAuditEntryKindID#20981,ModelPortfolioID#20982,primary_key_hash#20983,change_key_hash#20984,reject_reason#20985,reject_row#20986] parquet
;
Distinct
+- Project [AccountID#20826, CreatedDate#20827, StatusID#20829, scalar-subquery#20691 [SplitID#20825 && StatusID#20829] AS CancelledDate#20692]
   :  +- Aggregate [max(DateOfChange#20975) AS max(DateOfChange)#20988]
   :     +- Filter (((SiiAuditTypeID#20976 = 3) AND (SplitID#20973 = outer(SplitID#20825))) AND outer(StatusID#20829) IN (1,2))
   :        +- SubqueryAlias sia
   :           +- SubqueryAlias spark_catalog.dbo.SiiAudit

Can someone help refactor the T-SQL such that it executes without correlated/subqueries.

The query is as follows:

SELECT DISTINCT
           s.AccountID,
           s.CreatedDate,
           s.StatusID,
           (
               SELECT MAX(DateOfChange) AS CancelledDate
               FROM CRM.InvestmentInstruction.SiiAudit sia
               WHERE sia.SiiAuditTypeID = 3 -- cancelled
                     AND sia.SplitID = s.SplitID
                     AND s.StatusID IN ( 1, 2 ) -- cancelled, completed
           ) AS CancelledDate
    FROM CRM.InvestmentInstruction.Split s
        INNER JOIN CRM.InvestmentInstruction.SplitPortfolio sp
            ON sp.SplitID = s.SplitID
        INNER JOIN CRM.InvestmentInstruction.InvestmentRequest ir
            ON ir.InvestmentRequestID = sp.InvestmentRequestID
        INNER JOIN CRM.dbo.ModelPortfolio mp
            ON mp.ModelPortfolioID = ir.ModelID
        INNER JOIN
        (
            SELECT DISTINCT
                   mh.ModelPortfolioID
            FROM CRM.dbo.modelHolding mh
                INNER JOIN Securities.dbo.Security sec
                    ON sec.SecurityID = mh.LinkSecurityId
            WHERE sec.IsCashSecurity = 0
        ) mh
            ON mh.ModelPortfolioID = mp.ModelPortfolioID
    WHERE s.TypeID = 0

Solution

  • Can you move the condition outside the subquery this way?:

    case when s.StatusID in (1, 2) then (
               SELECT MAX(DateOfChange)
               FROM CRM.InvestmentInstruction.SiiAudit sia
               WHERE sia.SiiAuditTypeID = 3 -- cancelled
                     AND sia.SplitID = s.SplitID) end as CancelledDate