Search code examples
sqljsonpostgresqldynamic-sqljsonb

Expand JSONB into columns with first row as template


Let's say you have a Postgres table of the form:

 f1  | f2    |   metadata   
-----+-------+-----------------------
 a   | 33    | {"f3": "d", "f4": "e"}
 b   | 20    | {"f3": "d", "f4": "g"}

With the metadata column being an unstructured JSON field. How can I query this table such that the result records contain fields f1, f2, f3, and f4, with the JSON expanded to fill these fields?

I know json_populate_record() can do this if you know f3 and f4 are the field names beforehand. But I don't. I want to use the key names in metadata of the first row as template for all other rows.

In other words: I want the query's result columns to be f1, f2 + all the keys of the 1st row's JSON data. Any other keys that do not conform to the ones in the first row shall be dropped.


Solution

  • There is no natural "first" row. You need to define "first". Assuming the first row for ORDER BY f1, f2.

    If you don't know the number and data types of columns to be expected this cannot be done in a single SQL statement. SQL demands to know the return type, at least at call time. But there are various ways to do it with two statements.

    Step 0

    Here is a proper test setup to work with:

    CREATE TABLE tbl (f1 text, f2 int, metadata jsonb);
    INSERT INTO tbl VALUES
      ('a', 33, '{"f3": "d", "f4": "e"}')  -- "first" row
    , ('b', 20, '{"f3": "d", "f4": "g"}')  -- same keys
    , ('c', 40, '{"f7": "x", "f4": "o"}')  -- one matching key
    , ('d', 50, '{"f3": "o", "f9": "x", "f123": "z"}')  -- one match, two miss
    , ('e', 60, '{"x": "d", "y": "g"}')    -- no match
    , ('f', 33, '{"f3": 1, "f4": false}'); -- numeric and boolean
    

    Step 1

    Once you know number, names and types of the result columns, it gets simple, like you mentioned in the question. I suggest you create a temporary table to provide the row type for jsonb_populate_record():

    BEGIN;
    CREATE TEMP TABLE tmp(f3 text, f4 text) ON COMMIT DROP;
    
    SELECT f1, f2, meta.*
    FROM   tbl, jsonb_populate_record(NULL::tmp, metadata) meta;
    ROLLBACK;  -- or: COMMIT;
    

    ON COMMIT DROP drops the table at the end of the transaction automatically. You may or may not want that. If you do, use a single transaction for both commands.

    Temporary tables are only visible in the same session, so there is no naming conflict with multiple transactions doing the same.

    If you don't have this information, it get's more sophisticated.

    Step 2

    You can do the same with a DO command and dynamic SQL:

    DO
    $do$
    BEGIN
    EXECUTE 'CREATE TEMP TABLE tmp(f3 text, f4 text) ON COMMIT DROP';
    END
    $do$;
    

    Step 3

    Since we do not actually know number and names of output columns we extract that information from the first row. Assumptions:

    • Data type is text for all columns.
    • The "first" row has at least one key in the jsonb column.
    • Existing column names "f1" and "f2" are not repeated as keys in the JSON column. (Postgres allows duplicate names among output columns, but some clients have problems with that - and it's rather confusing.)

    DO
    $do$
    BEGIN
    
    EXECUTE (
       SELECT (SELECT 'CREATE TEMP TABLE tmp('
                    || string_agg(quote_ident(k), ' text, ')  -- f3 text, f4 
                    || ' text) ON COMMIT DROP'
               FROM   jsonb_object_keys(metadata) k)
       FROM   tbl
       ORDER  BY f1, f2
       LIMIT  1
       );
    
    END
    $do$;
    
    SELECT f1, f2, meta.*
    FROM   tbl, jsonb_populate_record(NULL::tmp, metadata) meta;
    

    Voilá.

    Be sure to properly escape key names with quote_ident() or similar.

    If column names were known in advance ...
    (addressing your comment), you could simply:

    SELECT f1, f2, metadata->>'f3', metadata->>'f4'
    FROM   tbl;
    

    For wide rows, jsonb_populate_record() is much more convenient, though. You can still use the dynamic solution, or you persist a table or a type and use that.

    Alternative

    If your second command can depend on your first, you could also just build the simple statement dynamically and execute that in another call:

    SELECT (SELECT 'SELECT f1, f2, metadata->>'
                 || string_agg(format('%1$L AS %1$I', k), ', metadata->>')
                 || ' FROM tbl'
            FROM   jsonb_object_keys(metadata) k)
    FROM   tbl
    ORDER  BY f1, f2
    LIMIT  1;
    

    Returns above simple query as text. Execute that as second command ... Execution may be a bit faster, but it requires two round-trips to the server while the first solution can make do with just one ... You decide.

    Using format() here to simplify the safe query string concatenation.