Search code examples
sql-servert-sqlmaxaggregatecoalesce

Trying to avoid correlated subquery in T-SQL for Netezza re-write - MAX(COALESCE()) not working in aggregate query


Take a look at this T-SQL query. It has a nested query. I'm trying to make it work without the nested query in the same way. The reason I'm doing this is because I'd like to write something similar in Netezza, but Netezza's query engine doesn't allow you to reference tables of an outer query inside a nested query. Notice in my dummy data, I have whether it's a branch or leaf hard coded, but in my first query, I'm calculating whether it's a branch or leaf, and it comes out the same.

At the end of my question, I have my dummy data and a simple query so you can see what's there.

SELECT
    DISTINCT F1.[PATH],  
    F1.BRANCH_OR_LEAF,  
    (
        SELECT
            COUNT(DISTINCT [FILE].ID)
        FROM FOLDER F2
            JOIN [FILE] ON F2.ID = [FILE].FOLDER_ID
        WHERE   
            F2.[PATH] LIKE (F1.[PATH] + '%')

    ) file_count,
    (
        SELECT
            CASE WHEN 
                MAX(COALESCE(F2.ID, -1)) != COALESCE(F1.ID, -1)
            THEN
                'B'
            ELSE
                'L'
            END AS BRANCH_OR_LEAF           
        FROM FOLDER F2
            JOIN [FILE] ON F2.ID = [FILE].FOLDER_ID
        WHERE   
            F2.[PATH] LIKE (F1.[PATH] + '%')

    ) branch_or_leaf
FROM
    [FOLDER] F1
ORDER BY 
    F1.[PATH]

enter image description here

Query where aggregate MAX(COALESCE not working:

SELECT
    F1.PATH AS FOLDER_PATH,
    COUNT(DISTINCT F.ID) AS FILE_COUNT,
    CASE
        WHEN COUNT(DISTINCT F.ID) > 0 THEN 'A'
        ELSE 'H'
    END,
    (
        LEN(F1.PATH) - LEN(REPLACE(F1.PATH, '/', '')) - 1
    ) AS FOLDER_LEVEL
    --,
    --CASE
    --  WHEN MAX(coalesce(F1.ID,'-1')) != coalesce(F2.ID,'-1') THEN 'B'
    --  ELSE 'L'
    --END AS BRANCH_OR_LEAF
FROM
    [FOLDER] F1
    LEFT JOIN [FOLDER] F2 ON F2.PATH LIKE (F1.PATH + '%')
    JOIN [FILE] F ON F2.ID = F.FOLDER_ID
GROUP BY
    F1.PATH

When I uncomment the comment, I get this.

Column 'FOLDER.ID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Here's what my current dummy data looks like.

SELECT F1.ID, F1.[PATH], F1.BRANCH_OR_LEAF, F.ID, F.NAME
FROM [FOLDER] F1
JOIN [dbo].[FILE] F ON F1.ID = F.FOLDER_ID

enter image description here


Solution

  • This should probably allow me to accomplish what I want in Netezza. In a realistic scenario, you can't confirm that the LEN function is sufficient, but since we're already checking if one string contains the other, this should be fine. I would use the CHECKSUM_AGG function, but Netezza doesn't use that. So this will have to do.

    create table #branch_or_leaf
    (
        [folder_id] int,
        [path] nvarchar(50),  -- for reference purposes only
        [branch_or_leaf] [nvarchar](50)
    )
    
    INSERT INTO #branch_or_leaf
    SELECT
        DISTINCT F1.ID, 
        F1.PATH,
        CASE 
            WHEN SUM(LEN(F2.PATH)) = LEN(F1.PATH) THEN 'L'
            ELSE 'B'
        END
    FROM
        [FOLDER] F1
        LEFT JOIN [FOLDER] F2 ON F2.PATH LIKE (F1.PATH + '%') 
    GROUP BY 
        F1.ID, F1.PATH
    
    SELECT * FROM #branch_or_leaf
    
    SELECT
        DISTINCT F1.PATH AS FOLDER_PATH,
        COUNT(DISTINCT F.ID) AS FILE_COUNT,
        CASE
            WHEN COUNT(DISTINCT F.ID) > 0 THEN 'A'
            ELSE 'H'
        END,
        (
            LEN(F1.PATH) - LEN(REPLACE(F1.PATH, '/', '')) - 1
        ) AS FOLDER_LEVEL,
        BL.branch_or_leaf AS branch_or_leaf
    FROM
        [FOLDER] F1
        LEFT JOIN [FOLDER] F2 ON F2.PATH LIKE (F1.PATH + '%')
        JOIN [FILE] F ON F2.ID = F.FOLDER_ID
        JOIN #branch_or_leaf BL ON F1.ID = BL.folder_id
    GROUP BY
        F1.PATH, BL.branch_or_leaf
    ORDER BY
        F1.PATH
    
    DROP TABLE #branch_or_leaf
    

    EDIT:

    Similar query, but now we have our nested query in the FROM clause. This might be legal in Netezza.

    SELECT
        DISTINCT F1.PATH AS FOLDER_PATH,
        COUNT(DISTINCT F.ID) AS FILE_COUNT,
        CASE
            WHEN COUNT(DISTINCT F.ID) > 0 THEN 'A'
            ELSE 'H'
        END,
        (
            LEN(F1.PATH) - LEN(REPLACE(F1.PATH, '/', '')) - 1
        ) AS FOLDER_LEVEL,
        BL.BL AS branch_or_leaf
    FROM
        [FOLDER] F1
        LEFT JOIN [FOLDER] F2 ON F2.PATH LIKE (F1.PATH + '%')
        JOIN [FILE] F ON F2.ID = F.FOLDER_ID
        JOIN 
        (
            SELECT
                DISTINCT F1.ID, 
                CASE 
                    WHEN SUM(LEN(F2.PATH)) = LEN(F1.PATH) THEN 'L'
                    ELSE 'B'
                END AS BL
            FROM
                [FOLDER] F1
                LEFT JOIN [FOLDER] F2 ON F2.PATH LIKE (F1.PATH + '%') 
            GROUP BY 
                F1.ID, F1.PATH      
        ) AS BL ON F1.ID = BL.ID
    GROUP BY
        F1.PATH, BL.BL
    ORDER BY
        F1.PATH