Search code examples
sqljsonpostgresqljsonb

How to do calculations on json data in Postgres


I'm storing AdWords report data in Postgres. Each report is stored in a table named Reports, which has a jsonb column named 'data'. Each report has json stored in its 'data' field that looks that looks like this:

[
  {
    match_type: "exact",
    search_query: "gm hubcaps",
    conversions: 2,
    cost: 1.24
  },
  {
    match_type: "broad",
    search_query: "gm auto parts",
    conversions: 34,
    cost: 21.33
  },
  {
    match_type: "phrase",
    search_query: "silverdo headlights",
    conversions: 63,
    cost: 244.05
  }
]

What I want to do is query off these data hashes and sum up the total number of conversions for a given report. I've looked though the Postgresql docs and it looks like you can only really do calculations on hashes, not arrays of hashes like this. Is what I'm trying to do possible in postgres? Do I need to make a temp table out of this array and do calculations off that? Or can I use a stored procedure?

I'm using Postgresql 9.4

EDIT The reason I'm not just using a regular, normalized table is that this is just one example of how report data could be structured. In my project, reports have to allow arbitrary keys, because they are populated by users uploading CSV's with any columns they like. It's basically just a way to get around having arbitrarily many, user-created tables.


Solution

  • What I want to do is query off these data hashes and sum up the conversions

    The fastest way should be with jsonb_populate_recordset(). But you need a registered row type for it.

    CREATE TEMP TABLE report_data (
    --   match_type text    -- commented out, because we only need ..
    -- , search_query text  -- .. conversions for this query
         conversions int
    -- , cost numeric
    );
    

    A temp table is one way to register a row type ad-hoc. More explanation in this related answer:

    Assuming a table report with report_id as PK for lack of inforamtion.

    SELECT r.report_id, sum(d.conversions) AS sum_conversions
    FROM   report r
    LEFT   JOIN LATERAL jsonb_populate_recordset(null::report_data, r.data) d ON true
    -- WHERE  r.report_id = 12345  -- only for given report?
    GROUP  BY 1;
    

    The LEFT JOIN ensures you get a result, even if data is NULL or empty or the JSON array is empty.

    For a sum from a single row in the underlying table, this is faster:

    SELECT d.sum_conversions
    FROM   report r
    LEFT   JOIN LATERAL (
       SELECT sum(conversions) AS sum_conversions
       FROM   jsonb_populate_recordset(null::report_data, r.data)
       ) d ON true
    WHERE  r.report_id = 12345;  -- enter report_id here
    

    Alternative with jsonb_array_elements() (no need for a registered row type):

    SELECT d.sum_conversions
    FROM   report r
    LEFT   JOIN LATERAL (
       SELECT sum((value->>'conversions')::int) AS sum_conversions
       FROM   jsonb_array_elements(r.data)
       ) d ON true
    WHERE  r.report_id = 12345;  -- enter report_id here
    

    Normally you would implement this as plain, normalized table. I don't see the benefit of JSON here (except that your application seems to require it, like you added).