Search code examples
postgresqlaggregate-functions

Postgres forces use of COALESCE while the value cannot be null


I am running a troubling request on Postgres and cannot figure out the following point.

Note : The following requests do not need tables, you may try them in any postgres prompt.

Given the following sample data :

SELECT
*
FROM jsonb_to_recordset('[{"cat": "A", "key": null, "value": null}, {"cat": "A", "key": "key_A", "value": "val_A"}, {"cat": "B", "key": null, "value": null}, {"cat": "C", "key": "key_C", "value": "val_C"}]')
AS x(cat text, key text, value text);
 cat |  key  | value 
-----+-------+-------
 A   |       | 
 A   | key_A | val_A
 B   |       | 
 C   | key_C | val_C
(4 rows)

The following query is working fine :

SELECT
cat,
CASE WHEN bool_or(key IS NULL) THEN null
ELSE
   jsonb_object_agg(COALESCE(key, '???'), value)
END AS obj
FROM jsonb_to_recordset('[{"cat": "A", "key": null, "value": null}, {"cat": "A", "key": "key_A", "value": "val_A"}, {"cat": "B", "key": null, "value": null}, {"cat": "C", "key": "key_C", "value": "val_C"}]')
AS x(cat text, key text, value text)
GROUP BY cat 
 cat |        obj         
-----+--------------------
 B   | 
 C   | {"key_C": "val_C"}
 A   | 
(3 rows)

But I do not understand why I have to use the COALESCE since I guess at this point the value CANNOT be null (as you can see in the result there is no ??? key.

However, when I remove the COALESCE :

SELECT
cat,
CASE WHEN bool_or(key IS NULL) THEN null
ELSE
   jsonb_object_agg(key, value)
END AS obj
FROM jsonb_to_recordset('[{"cat": "A", "key": null, "value": null}, {"cat": "A", "key": "key_A", "value": "val_A"}, {"cat": "B", "key": null, "value": null}, {"cat": "C", "key": "key_C", "value": "val_C"}]')
AS x(cat text, key text, value text)
GROUP BY cat 
psql:commands.sql:9: ERROR:  field name must not be null

I would appreciate any enlightning ! Cheers.


Solution

  • The aggregates are calculated for each row simultaneously. The first value for jsonb_object_agg() is computed when the result of bool_or() is not yet known. You can illustrate this mechanism with a trivial custom aggregate:

    create or replace function my_agg_function(int, int)
    returns int language plpgsql as $$
    begin
        raise notice 'called for %', $2;
        return coalesce($1, 0)+ coalesce($2, 0);
    end $$;
    
    create aggregate my_agg(int) (
        sfunc = my_agg_function,
        stype = int);
    

    In the below query my_agg_function() is executed three times because the result of bool_or() is only known when all rows are processed:

    select case when bool_or(i = 10) then my_agg(i) end
    from generate_series(1, 3) as i;
    
    NOTICE:  called for 1
    NOTICE:  called for 2
    NOTICE:  called for 3
     case
    ------
    
    (1 row)