I am running a troubling request on Postgres and cannot figure out the following point.
Note : The following requests do not need tables, you may try them in any postgres prompt.
Given the following sample data :
SELECT
*
FROM jsonb_to_recordset('[{"cat": "A", "key": null, "value": null}, {"cat": "A", "key": "key_A", "value": "val_A"}, {"cat": "B", "key": null, "value": null}, {"cat": "C", "key": "key_C", "value": "val_C"}]')
AS x(cat text, key text, value text);
cat | key | value
-----+-------+-------
A | |
A | key_A | val_A
B | |
C | key_C | val_C
(4 rows)
The following query is working fine :
SELECT
cat,
CASE WHEN bool_or(key IS NULL) THEN null
ELSE
jsonb_object_agg(COALESCE(key, '???'), value)
END AS obj
FROM jsonb_to_recordset('[{"cat": "A", "key": null, "value": null}, {"cat": "A", "key": "key_A", "value": "val_A"}, {"cat": "B", "key": null, "value": null}, {"cat": "C", "key": "key_C", "value": "val_C"}]')
AS x(cat text, key text, value text)
GROUP BY cat
cat | obj
-----+--------------------
B |
C | {"key_C": "val_C"}
A |
(3 rows)
But I do not understand why I have to use the COALESCE
since I guess at this point the value CANNOT be null (as you can see in the result there is no ???
key.
However, when I remove the COALESCE
:
SELECT
cat,
CASE WHEN bool_or(key IS NULL) THEN null
ELSE
jsonb_object_agg(key, value)
END AS obj
FROM jsonb_to_recordset('[{"cat": "A", "key": null, "value": null}, {"cat": "A", "key": "key_A", "value": "val_A"}, {"cat": "B", "key": null, "value": null}, {"cat": "C", "key": "key_C", "value": "val_C"}]')
AS x(cat text, key text, value text)
GROUP BY cat
psql:commands.sql:9: ERROR: field name must not be null
I would appreciate any enlightning ! Cheers.
The aggregates are calculated for each row simultaneously. The first value for jsonb_object_agg()
is computed when the result of bool_or()
is not yet known. You can illustrate this mechanism with a trivial custom aggregate:
create or replace function my_agg_function(int, int)
returns int language plpgsql as $$
begin
raise notice 'called for %', $2;
return coalesce($1, 0)+ coalesce($2, 0);
end $$;
create aggregate my_agg(int) (
sfunc = my_agg_function,
stype = int);
In the below query my_agg_function()
is executed three times because the result of bool_or()
is only known when all rows are processed:
select case when bool_or(i = 10) then my_agg(i) end
from generate_series(1, 3) as i;
NOTICE: called for 1
NOTICE: called for 2
NOTICE: called for 3
case
------
(1 row)