Search code examples
postgresqljsonbpostgresql-9.6

How do I efficiently calculate summary stats on JSONB arrays nested in Postgres?


Using Postgres 9.6.

I have this working but suspect there's a more efficient way. What's the best way to calculate AVG, SUM, etc. on the MyEventLength arrays?

DROP TABLE IF EXISTS activity;
DROP SEQUENCE IF EXISTS activity_id_seq;
CREATE SEQUENCE activity_id_seq;

CREATE TABLE activity (
    id INT CHECK (id > 0) NOT NULL DEFAULT NEXTVAL ('activity_id_seq'),
    user_id INT,
    events JSONB
);

INSERT INTO activity (user_id,events) VALUES
(1, '{"MyEvent":{"MyEventLength":[450,790,1300,5400],"MyEventValue":[334,120,120,940]}}'),
(1, '{"MyEvent":{"MyEventLength":[12],"MyEventValue":[4]}}'),
(2, '{"MyEvent":{"MyEventLength":[450,790,1300,5400],"MyEventValue":[334,120,120,940]}}'),
(1, '{"MyEvent":{"MyEventLength":[1000,2000],"MyEventValue":[450,550]}}');

To date, this is the best way I can figure out to calculate the average for the MyEventLength array for user_id 1:

SELECT avg(recs::text::numeric) FROM (
    SELECT jsonb_array_elements(a.event_length) as recs FROM (
        SELECT events->'MyEvent'->'MyEventLength' as event_length from activity
        WHERE user_id = 1
    )a
) b;

Or this variation:

SELECT avg(recs) FROM (
    SELECT jsonb_array_elements_text(a.event_length)::numeric as recs FROM (
        SELECT events->'MyEvent'->'MyEventLength' as event_length from activity
        WHERE user_id = 1
    )a
) b;

Is there a better way to do this that does not require as many sub selects?


Solution

  • You need to pass rows with scalar values to avg(), otherwise (if you'll try to pass the output of some set-returning function like jsonb_array_elements_text(..)) you will get an errors such as this:

    ERROR:  set-valued function called in context that cannot accept a set
    

    So you definitely need at least 1 sub-query or CTE.

    Option 1, w/o CTE:

    select avg(v::numeric)
    from (
      select
        jsonb_array_elements_text(events->'MyEvent'->'MyEventLength')
      from activity
      where user_id = 1
    ) as a(v);
    

    Option 2, CTE (readability is better):

    with vals as (
      select
        jsonb_array_elements_text(events->'MyEvent'->'MyEventLength')::numeric as val
      from activity
      where user_id = 1
    )
    select avg(val)
    from vals
    ;
    

    UPDATE, Option 3: It turned out, that you can do it w/o any nested queries, using implicit JOIN LATERAL:

    select avg(val::text::numeric)
    from activity a, jsonb_array_elements(a.events->'MyEvent'->'MyEventLength') vals(val)
    where user_id = 1;