Take a look at this T-SQL query. It has a nested query. I'm trying to make it work without the nested query in the same way. The reason I'm doing this is because I'd like to write something similar in Netezza, but Netezza's query engine doesn't allow you to reference tables of an outer query inside a nested query. Notice in my dummy data, I have whether it's a branch or leaf hard coded, but in my first query, I'm calculating whether it's a branch or leaf, and it comes out the same.
At the end of my question, I have my dummy data and a simple query so you can see what's there.
SELECT
DISTINCT F1.[PATH],
F1.BRANCH_OR_LEAF,
(
SELECT
COUNT(DISTINCT [FILE].ID)
FROM FOLDER F2
JOIN [FILE] ON F2.ID = [FILE].FOLDER_ID
WHERE
F2.[PATH] LIKE (F1.[PATH] + '%')
) file_count,
(
SELECT
CASE WHEN
MAX(COALESCE(F2.ID, -1)) != COALESCE(F1.ID, -1)
THEN
'B'
ELSE
'L'
END AS BRANCH_OR_LEAF
FROM FOLDER F2
JOIN [FILE] ON F2.ID = [FILE].FOLDER_ID
WHERE
F2.[PATH] LIKE (F1.[PATH] + '%')
) branch_or_leaf
FROM
[FOLDER] F1
ORDER BY
F1.[PATH]
Query where aggregate MAX(COALESCE not working:
SELECT
F1.PATH AS FOLDER_PATH,
COUNT(DISTINCT F.ID) AS FILE_COUNT,
CASE
WHEN COUNT(DISTINCT F.ID) > 0 THEN 'A'
ELSE 'H'
END,
(
LEN(F1.PATH) - LEN(REPLACE(F1.PATH, '/', '')) - 1
) AS FOLDER_LEVEL
--,
--CASE
-- WHEN MAX(coalesce(F1.ID,'-1')) != coalesce(F2.ID,'-1') THEN 'B'
-- ELSE 'L'
--END AS BRANCH_OR_LEAF
FROM
[FOLDER] F1
LEFT JOIN [FOLDER] F2 ON F2.PATH LIKE (F1.PATH + '%')
JOIN [FILE] F ON F2.ID = F.FOLDER_ID
GROUP BY
F1.PATH
When I uncomment the comment, I get this.
Column 'FOLDER.ID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Here's what my current dummy data looks like.
SELECT F1.ID, F1.[PATH], F1.BRANCH_OR_LEAF, F.ID, F.NAME
FROM [FOLDER] F1
JOIN [dbo].[FILE] F ON F1.ID = F.FOLDER_ID
This should probably allow me to accomplish what I want in Netezza. In a realistic scenario, you can't confirm that the LEN function is sufficient, but since we're already checking if one string contains the other, this should be fine. I would use the CHECKSUM_AGG function, but Netezza doesn't use that. So this will have to do.
create table #branch_or_leaf
(
[folder_id] int,
[path] nvarchar(50), -- for reference purposes only
[branch_or_leaf] [nvarchar](50)
)
INSERT INTO #branch_or_leaf
SELECT
DISTINCT F1.ID,
F1.PATH,
CASE
WHEN SUM(LEN(F2.PATH)) = LEN(F1.PATH) THEN 'L'
ELSE 'B'
END
FROM
[FOLDER] F1
LEFT JOIN [FOLDER] F2 ON F2.PATH LIKE (F1.PATH + '%')
GROUP BY
F1.ID, F1.PATH
SELECT * FROM #branch_or_leaf
SELECT
DISTINCT F1.PATH AS FOLDER_PATH,
COUNT(DISTINCT F.ID) AS FILE_COUNT,
CASE
WHEN COUNT(DISTINCT F.ID) > 0 THEN 'A'
ELSE 'H'
END,
(
LEN(F1.PATH) - LEN(REPLACE(F1.PATH, '/', '')) - 1
) AS FOLDER_LEVEL,
BL.branch_or_leaf AS branch_or_leaf
FROM
[FOLDER] F1
LEFT JOIN [FOLDER] F2 ON F2.PATH LIKE (F1.PATH + '%')
JOIN [FILE] F ON F2.ID = F.FOLDER_ID
JOIN #branch_or_leaf BL ON F1.ID = BL.folder_id
GROUP BY
F1.PATH, BL.branch_or_leaf
ORDER BY
F1.PATH
DROP TABLE #branch_or_leaf
Similar query, but now we have our nested query in the FROM clause. This might be legal in Netezza.
SELECT
DISTINCT F1.PATH AS FOLDER_PATH,
COUNT(DISTINCT F.ID) AS FILE_COUNT,
CASE
WHEN COUNT(DISTINCT F.ID) > 0 THEN 'A'
ELSE 'H'
END,
(
LEN(F1.PATH) - LEN(REPLACE(F1.PATH, '/', '')) - 1
) AS FOLDER_LEVEL,
BL.BL AS branch_or_leaf
FROM
[FOLDER] F1
LEFT JOIN [FOLDER] F2 ON F2.PATH LIKE (F1.PATH + '%')
JOIN [FILE] F ON F2.ID = F.FOLDER_ID
JOIN
(
SELECT
DISTINCT F1.ID,
CASE
WHEN SUM(LEN(F2.PATH)) = LEN(F1.PATH) THEN 'L'
ELSE 'B'
END AS BL
FROM
[FOLDER] F1
LEFT JOIN [FOLDER] F2 ON F2.PATH LIKE (F1.PATH + '%')
GROUP BY
F1.ID, F1.PATH
) AS BL ON F1.ID = BL.ID
GROUP BY
F1.PATH, BL.BL
ORDER BY
F1.PATH