Search code examples
arraysjsonpostgresql-9.3

How to convert JSON Array of Arrays to columns and rows


I'm pulling data from an API in JSON with a format like the example data below. Where essentially every "row" is an array of values. The API doc defines the columns and their types in advance. So I know the col1 is, for example, a varchar, and that col2 is an int.

CREATE TEMP TABLE dat (data json);
INSERT INTO dat
VALUES ('{"COLUMNS":["col1","col2"],"DATA":[["a","1"],["b","2"]]}');

I want to transform this within PostgreSQL 9.3 such that I end up with:

col1 | col2
------------
  a  |  1
  b  |  2

Using json_array_elements I can get to:

SELECT json_array_elements(data->'DATA') 
FROM dat

json_array_elements
json
---------
["a","1"]
["b","2"]

but then I can't figure out how to do either convert the JSON array to a PostgreSQL array so I can perform something like unnest(ARRAY['a','1'])


Solution

  • General case for unknown columns

    To get a result like

    col1 | col2
    ------------
      a  |  1
      b  |  2
    

    will require a bunch of dynamic SQL, because you don't know the types of the columns in advance, nor the column names.

    You can unpack the json with something like:

    SELECT
      json_array_element_text(colnames, colno) AS colname,
      json_array_element_text(colvalues, colno) AS colvalue,
      rn,
      idx,
      colno
    FROM (
      SELECT
        data -> 'COLUMNS' AS colnames,
        d AS colvalues,
        rn,
        row_number() OVER () AS idx
      FROM (
        SELECT data, row_number() OVER () AS rn FROM dat
      ) numbered
      cross join json_array_elements(numbered.data -> 'DATA') d
    ) elements
    cross join generate_series(0, json_array_length(colnames) - 1) colno;
    

    producing a result set like:

     colname | colvalue | rn | idx | colno 
    ---------+----------+----+-----+-------
     col1    | a        |  1 |   1 |     0
     col2    | 1        |  1 |   1 |     1
     col1    | b        |  1 |   2 |     0
     col2    | 2        |  1 |   2 |     1
    (4 rows)
    

    You can then use this as input to the crosstab function from the tablefunc module with something like:

    SELECT * FROM crosstab('
    SELECT
      to_char(rn,''00000000'')||''_''||to_char(idx,''00000000'') AS rowid,
      json_array_element_text(colnames, colno) AS colname,
      json_array_element_text(colvalues, colno) AS colvalue
    FROM (
      SELECT
        data -> ''COLUMNS'' AS colnames,
        d AS colvalues,
        rn,
        row_number() OVER () AS idx
      FROM (
        SELECT data, row_number() OVER () AS rn FROM dat
      ) numbered
      cross join json_array_elements(numbered.data -> ''DATA'') d
    ) elements
    cross join generate_series(0, json_array_length(colnames) - 1) colno;
    ') results(rowid text, col1 text, col2 text);
    

    producing:

            rowid        | col1 | col2 
    ---------------------+------+------
      00000001_ 00000001 | a    | 1
      00000001_ 00000002 | b    | 2
    (2 rows)
    

    The column names are not retained here.

    If you were on 9.4 you could avoid the row_number() calls and use WITH ORDINALITY, making it much cleaner.

    Simplified with fixed, known columns

    Since you apparently know the number of columns and their types in advance the query can be considerably simplified.

    SELECT
      col1, col2
    FROM (
      SELECT
        rn,
        row_number() OVER () AS idx,
        elem ->> 0 AS col1,
        elem ->> 1 :: integer AS col2
      FROM (
        SELECT data, row_number() OVER () AS rn FROM dat
      ) numbered
      cross join json_array_elements(numbered.data -> 'DATA') elem
      ORDER BY 1, 2
    ) x;
    

    result:

     col1 | col2 
    ------+------
     a    |    1
     b    |    2
    (2 rows)
    

    Using 9.4 WITH ORDINALITY

    If you were using 9.4 you could keep it cleaner using WITH ORDINALITY:

    SELECT
      col1, col2
    FROM (
      SELECT
        elem ->> 0 AS col1,
        elem ->> 1 :: integer AS col2
      FROM
        dat
      CROSS JOIN
        json_array_elements(dat.data -> 'DATA') WITH ORDINALITY AS elements(elem, idx)
      ORDER BY idx
    ) x;