Search code examples
postgresqlaggregate-functionsjsonb

Grouping by unique values inside a JSONB array


Consider the following table structure:

CREATE TABLE residences (id int, price int, categories jsonb);

INSERT INTO residences VALUES
  (1, 3, '["monkeys", "hamsters", "foxes"]'),
  (2, 5, '["monkeys", "hamsters", "foxes", "foxes"]'),
  (3, 7, '[]'),
  (4, 11, '["turtles"]');

SELECT * FROM residences;

 id | price |                categories
----+-------+-------------------------------------------
  1 |     3 | ["monkeys", "hamsters", "foxes"]
  2 |     5 | ["monkeys", "hamsters", "foxes", "foxes"]
  3 |     7 | []
  4 |    11 | ["turtles"]

Now I would like to know how many residences there are for each category, as well as their sum of prices. The only way I found was to do this was using a sub-query:

SELECT category, SUM(price), COUNT(*) AS residences_no
FROM
  residences a,
  (
    SELECT DISTINCT(jsonb_array_elements(categories)) AS category
    FROM residences
  ) b
WHERE a.categories @> category
GROUP BY category
ORDER BY category;

  category  | sum | residences_no
------------+-----+---------------
 "foxes"    |   8 |             2
 "hamsters" |   8 |             2
 "monkeys"  |   8 |             2
 "turtles"  |  11 |             1

Using jsonb_array_elements without subquery would return three residences for foxes because of the duplicate entry in the second row. Also the price of the residence would be inflated by 5.

Is there any way to do this without using the sub-query, or any better way to accomplish this result?

EDIT

Initially I did not mention the price column.


Solution

  • select category, count(distinct (id, category))
    from residences, jsonb_array_elements(categories) category
    group by category
    order by category;
    
      category  | count 
    ------------+-------
     "foxes"    |     2
     "hamsters" |     2
     "monkeys"  |     2
     "turtles"  |     1
    (4 rows)
    

    You have to use a derived table to aggregate another column (all prices at 10):

    select category, count(*), sum(price) total
    from (
        select distinct id, category, price
        from residences, jsonb_array_elements(categories) category
    ) s
    group by category
    order by category;
    
      category  | count | total 
    ------------+-------+-------
     "foxes"    |     2 |    20
     "hamsters" |     2 |    20
     "monkeys"  |     2 |    20
     "turtles"  |     1 |    10
    (4 rows)