ARRAY_AGG function omits NULL
by definition.
The requirement is to generate array with the following format:
SELECT [NULL, 'a1', 'a2', 'a3', NULL]
-- [ undefined, "a1", "a2", "a3", undefined ]
from source data using ARRAY_AGG
function and nullifying all entries not starting with letter a
:
CREATE OR REPLACE TABLE tab(id INT, val TEXT) AS
SELECT * FROM VALUES (1,'b0'), (2,'a1'), (3,'a2'), (4,'a3'), (5,'b1');
Attempt 1:
SELECT ARRAY_AGG(CASE WHEN val LIKE 'a%' THEN val END)
WITHIN GROUP (ORDER BY id) AS result,
result = [NULL, 'a1', 'a2', 'a3', NULL] AS is_output_as_requested
FROM tab;
-- [ "a1", "a2", "a3" ] FALSE
Result: ARRAY_AGG
works as intended and skips NULL
; The output does not match the requirement.
Attempt 2:
SELECT ARRAY_AGG(CASE WHEN val LIKE 'a%' THEN val ELSE PARSE_JSON('null') END)
WITHIN GROUP (ORDER BY id) AS result,
result = [NULL, 'a1', 'a2', 'a3', NULL] AS is_output_as_requested
FROM tab;
-- [ "a1", "a2", "a3" ] FALSE
Rationale: CASE
expression output must have the same data type, therefore implicit conversion: TO_CHAR(PARSE_JSON('null'))
<=> NULL
Attempt 3:
SELECT ARRAY_AGG(CASE WHEN val LIKE 'a%' THEN val::VARIANT ELSE PARSE_JSON('null') END)
WITHIN GROUP (ORDER BY id)AS result,
result = [NULL, 'a1', 'a2', 'a3', NULL] AS is_output_as_requested
FROM tab;
-- [ null, "a1", "a2", "a3", null ] FALSE
Almost what is requested, though not quite. Type of the first and last element is NULL_VALUE
.
I am aware of VARIANT NULL. I am seeking for a way to "preserve NULLs" in exactly the format as requested for ARRAY_AGG
.
The solution must be a standalone SQL expression (no Snowflake Scripting/UDFs).
It is possible to achieve it by aggregating arrays instead of scalar values:
SELECT ARRAY_FLATTEN(ARRAY_AGG(CASE WHEN val LIKE 'a%' THEN [val] ELSE [NULL] END)
WITHIN GROUP (ORDER BY id)) AS result,
result = [NULL, 'a1', 'a2', 'a3', NULL] AS is_output_as_requested
FROM tab;
-- [ undefined, "a1", "a2", "a3", undefined ] TRUE
Rationale: ARRAY_AGG
does not skip [NULL]
.
Step 1:
SELECT ARRAY_AGG(CASE WHEN val LIKE 'a%' THEN [val] ELSE [NULL] END)
WITHIN GROUP (ORDER BY id) AS result
FROM tab;
-- [ [undefined], [ "a1"], ["a2"], ["a3"], [undefined] ]
Step 2:
SELECT ARRAY_FLATTEN(...)
-- [ undefined, "a1", "a2", "a3", undefined ]
In general:
SELECT ARRAY_FLATTEN(ARRAY_AGG([some_col]))
FROM tab;