Search code examples
sqlpostgresqljsonb

Query for a json property with the highest (max) date as value in a jsonb column


I have a postgres table with a jsonb column that gets inserted/updated as soon as a status is reached. I would like to query the latest status with it's date.

Given I have the following jsonb status column.

{
 "pending": "2018-01-12T12:34:41.785945+00:00",
 "started":"2018-01-10T15:52:41.785945+00:00",
 "processed":"2018-01-18T12:52:41.785945+00:00"
}

What would be the query to get?

"processed":"2018-01-18T12:52:41.785945+00:00"...

Basically select the property with the max date, Is this possible at all? And if so how would look the query?

EDIT: the Status JSON can have different properties at different times. So it not always the 3 'pending','started','processed'. There could only be pending or in future others.... The question is the latest status with it's date.


Solution

  • Let's create the table with couple of sample data.

    DROP TABLE IF EXISTS test;
    
    CREATE TABLE test(
    id integer,  
    status json  
    );
    
    INSERT INTO test VALUES (1, '{
     "pending": "2018-01-12T12:34:41.785945+00:00",
     "started":"2018-01-10T15:52:41.785945+00:00",
     "processed":"2018-01-18T12:52:41.785945+00:00"
    }');
    INSERT INTO test VALUES (2, '{
     "pending": "2018-01-12T12:31:41.785945+00:00",
     "started":"2018-01-20T15:55:41.785945+00:00",
     "processed":"2018-01-10T12:20:41.785945+00:00"
    }');
    

    I managed to get the max date property by extracting values from json to a separate table, and applying typical max function and case statement to get corresponding property of interest. This probably defeats the purpose of using json type (maybe try json_type(json) instead?)

    --- unpack values and put them into different columns
    WITH t1 AS(
    SELECT id, status->>'pending' AS pending, 
          status->>'started' AS started, 
          status->>'processed' AS processed       
    FROM test),
    --- find maximum time corresponding to each_id
    t2 AS (
    SELECT id, GREATEST(t1.pending, t1.processed, t1.started) AS latest
    FROM t1
    )
    --- join to get matching column that gave latest time
    SELECT t1.id, CASE WHEN t1.pending = t2.latest THEN 'pending'
                      WHEN t1.started = t2.latest THEN 'started'
                      WHEN t1.processed = t2.latest THEN 'processed' 
                      END AS max_status,
                  t2.latest AS max_time
    FROM t1
    JOIN t2
    ON t1.id = t2.id
    ;
    

    You can carry along other columns of interest for each id. This results

     id | max_status |             max_time             
    ----+------------+----------------------------------
      1 | processed  | 2018-01-18T12:52:41.785945+00:00
      2 | started    | 2018-01-20T15:55:41.785945+00:00
    (2 rows)
    

    EDIT:: After your edit, I made some changes to take care of arbitrary presence/absence of particular keys in your json field. Let's use json_each(json) to unpack key, value for each id to different columns

    SELECT id, (json_each(status)).* 
    FROM test
    WHERE id=1;
    
     id |    key    |               value                
    ----+-----------+------------------------------------
      1 | pending   | "2018-01-12T12:34:41.785945+00:00"
      1 | started   | "2018-01-10T15:52:41.785945+00:00"
      1 | processed | "2018-01-18T12:52:41.785945+00:00"
    (3 rows)
    

    Not quite what we want but almost there. First thing to notice is the column names key, value which we can use to transform data. Second, values are json objects and therefore we have to take care of double quotations after casting to text. trim handles this nicely. Rest is to partition your data by id, get maximum time, and filter rows to get corresponding status.

    WITH t1 AS(
        SELECT id, key as status, trim(both '"' from value::text) as time_of 
        FROM test, json_each(status)
    ),
    t2 as(
    SELECT id, status,
      to_timestamp(time_of, 'YYYY-MM-DD"T"HH24:MI:SS') as time_of,
      MAX(to_timestamp(time_of, 'YYYY-MM-DD"T"HH24:MI:SS')) 
            OVER(PARTITION by id) AS max_time       
    FROM t1)
    SELECT id, status, max_time 
    FROM t2
    WHERE time_of = max_time;