Search code examples
sqldb2subquerysyntax-errorcorrelated-subquery

How to include correlated subquery clause in SQL Join?


So I'm trying to write a query that includes an IN clause within a join. The reason is because I need to join DLGVRSN to itself, and when the first one's DLGVRSN.DSTATUS = 500 then I need it to get the most-recent version of the second one that's been updated before the first DLGVRSN's date.

SELECT
    *
FROM
    CCCX.DLGVRSN
    LEFT JOIN CCCX.QSTVRSN
        ON QSTVRSN.DVFK = DLGVRSN.DID
    LEFT JOIN CCCX.DLGVRSN SQS_DLGVRSN
        ON SQS_DLGVRSN.DHFK = QSTVRSN.DSUBFK
        AND (
            (
                DLGVRSN.DSTATUS = 500
                AND SQS_DLGVRSN.DID IN ( -- Error occurs on this line.
                    SELECT DID
                    FROM CCCX.DLGVRSN LATEST_SQS_BEFORE_DATE
                    WHERE
                        LATEST_SQS_BEFORE_DATE.DHFK = QSTVRSN.DSUBFK
                        AND LATEST_SQS_BEFORE_DATE.DATEUPD <= IFNULL(DLGVRSN.DATEUPD, DLGVRSN.DATECRT)
                    ORDER BY DATEUPD DESC
                    FETCH FIRST ROW ONLY
                )
            ) OR (
                DLGVRSN.DSTATUS <> 500
                AND SQS_DLGVRSN.DSTATUS = 100
            )
        )
WHERE
    DLGVRSN.DID = 32367

However, it complains:

[SQL0115] Comparison operator IN not valid. Cause . . . . . :   Simple comparison operators other than equal and not equal cannot be used with a list of items.  ANY, ALL, and SOME comparison operators must be followed by a fullselect, rather than an expression or a list of items. Subqueries cannot be specified in a JOIN condition or in a CASE expression. Recovery  . . . :   Change either the comparison or the operand.  Try the request again.

Looking at other Questions on here, it was suggested that I change it to use a separate join, instead. So I tried changing it to this:

SELECT
    *
FROM
    CCCX.DLGVRSN
    LEFT JOIN CCCX.QSTVRSN
        ON QSTVRSN.DVFK = DLGVRSN.DID
    LEFT JOIN (
        SELECT
            LATEST_SQS_BEFORE_DATE.DID
            , LATEST_SQS_BEFORE_DATE.DHFK
        FROM CCCX.DLGVRSN LATEST_SQS_BEFORE_DATE
        WHERE
            LATEST_SQS_BEFORE_DATE.DHFK = QSTVRSN.DSUBFK -- Error occurs on this line.
            AND LATEST_SQS_BEFORE_DATE.DATEUPD <= IFNULL(DLGVRSN.DATEUPD, DLGVRSN.DATECRT)
        ORDER BY DATEUPD DESC
        FETCH FIRST ROW ONLY
    ) HISTORIC_MODEL_SQS_IDS
        ON HISTORIC_MODEL_SQS_IDS.DHFK = QSTVRSN.DSUBFK 
    LEFT JOIN CCCX.DLGVRSN SQS_DLGVRSN
        ON SQS_DLGVRSN.DHFK = QSTVRSN.DSUBFK
        AND (
            (
                DLGVRSN.DSTATUS = 500
                AND SQS_DLGVRSN.DID = HISTORIC_MODEL_SQS_IDS.DID
            ) OR (
                DLGVRSN.DSTATUS <> 500
                AND SQS_DLGVRSN.DSTATUS = 100
            )
        )
WHERE
    DLGVRSN.DID = 32367

But when I do that, it complains about not finding DLGVRSN nor QSTVRSN:

[SQL0206] Column or global variable DSUBFK not found. Cause . . . . . :   DSUBFK was not found as a column of table *N in *N and was not found as a global variable in *N. If the table is *N, DSUBFK is not a column of any table or view that can be referenced, or DSUBFK is a special register that cannot be set in an atomic compound statement. Recovery  . . . :   Do one of the following and try the request again: -- Ensure that the column and table names are specified correctly in the statement. -- If this is a SELECT statement, ensure that all the required tables were named in the FROM clause. -- If the column was intended to be a correlated reference, qualify the column with the correct table designator. -- If the column was intended to be a global variable, qualify the name with the schema where the global variable exists or ensure the schema is in the path. -- If this is a SET statement for a special register within an atomic compound dynamic statement, remove the statement or remove the ATOMIC keyword.

I likewise tried to change the subquery to use MAX(DID) so I could use = instead of IN. That gives the same error as the second example above.

So... how can I accomplish this?

Relevant DDL:

CREATE TABLE CCCX/DLGVRSN ( 
    DID DECIMAL(11, 0) GENERATED ALWAYS AS IDENTITY ( 
    START WITH 1 INCREMENT BY 1 
    MINVALUE 1 NO MAXVALUE 
    NO CYCLE ORDER 
    CACHE 20 ) 
    , 
    DSTATUS NUMERIC(3, 0) NOT NULL DEFAULT 0 , 
    DHFK DECIMAL(11, 0) NOT NULL DEFAULT 0 , 
    DATECRT TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP , 
    DATEUPD TIMESTAMP DEFAULT NULL , 
    CONSTRAINT CCCX/Q_DLGVRSN_DID_00001 PRIMARY KEY( DID ) )   
      
    RCDFMT CC0102P    ;
  

CREATE TABLE CCCX/QSTVRSN ( 
    QID DECIMAL(11, 0) GENERATED ALWAYS AS IDENTITY ( 
    START WITH 1 INCREMENT BY 1 
    MINVALUE 1 NO MAXVALUE 
    NO CYCLE ORDER 
    CACHE 20 ) 
    ,
    DVFK DECIMAL(11, 0) NOT NULL DEFAULT 0 , 
    DSUBFK DECIMAL(11, 0) DEFAULT NULL , 
    CONSTRAINT CCCX/Q_QUESTION_QID_00001 PRIMARY KEY( QID ) )   
      
    RCDFMT CC0108P    ;
  
ALTER TABLE CCCX/QSTVRSN 
    ADD CONSTRAINT CCCX/Q_QSTVRSN_DVFK_00001 
    FOREIGN KEY( DVFK ) 
    REFERENCES CCCX/DLGVRSN ( DID ) 
    ON DELETE NO ACTION 
    ON UPDATE NO ACTION ;

Solution

  • When you code a subquery that refers to a column of an other table of the from clause, you have to use the LATERAL keyword to make it available inside the subquery

    SELECT
        *
    FROM
        CCCX.DLGVRSN
        LEFT JOIN CCCX.QSTVRSN
            ON QSTVRSN.DVFK = DLGVRSN.DID
        LEFT JOIN LATERAL (
            SELECT
                LATEST_SQS_BEFORE_DATE.DID
                , LATEST_SQS_BEFORE_DATE.DHFK
            FROM CCCX.DLGVRSN LATEST_SQS_BEFORE_DATE
            WHERE
                LATEST_SQS_BEFORE_DATE.DHFK = QSTVRSN.DSUBFK -- Error occurs on this line.
                AND LATEST_SQS_BEFORE_DATE.DATEUPD <= IFNULL(DLGVRSN.DATEUPD, DLGVRSN.DATECRT)
            ORDER BY DATEUPD DESC
            FETCH FIRST ROW ONLY
        ) HISTORIC_MODEL_SQS_IDS
            ON HISTORIC_MODEL_SQS_IDS.DHFK = QSTVRSN.DSUBFK 
        LEFT JOIN CCCX.DLGVRSN SQS_DLGVRSN
            ON SQS_DLGVRSN.DHFK = QSTVRSN.DSUBFK
            AND (
                (
                    DLGVRSN.DSTATUS = 500
                    AND SQS_DLGVRSN.DID = HISTORIC_MODEL_SQS_IDS.DID
                ) OR (
                    DLGVRSN.DSTATUS <> 500
                    AND SQS_DLGVRSN.DSTATUS = 100
                )
            )
    WHERE
        DLGVRSN.DID = 32367