Search code examples
sqlpostgresqlamazon-redshiftplpgsql

Create a temporary table in Redshift using a JSON string format parameter


I have a redshift stored proc signature that look like

CREATE OR REPLACE PROCEDURE get_filtered_results
(
    p_id smallint,
    p_filters varchar(65535), 
    rs_out REFCURSOR
)
LANGUAGE plpgsql

The p_filters is a JSON formated string.

[{idToFilter: 1, range:'2:2', idChildren: 4, description: 'blabla1'}, {idToFilter: 2, range:'3:5', idChildren: 3, description: 'blabla2'},
{idToFilter: 3, range:'2:6', idChildren: 2, description: 'blabla3'}]

I would like to be able to convert that string into a temporary table where every property is a column, and every record of the array is a row in my table with a query similar to

    CREATE TEMPORARY TABLE tmp_filter
    AS
    (
        SELECT * FROM JSON_PARSE(p_filters)
    );

Is there an easy way to do it?


Solution

  • So let's break this into pieces as I said before. (

    #1 Convert the string into rows and columns

    The first issue here is that your string is not valid JSON. JSON requires double quotes not single quotes but this is easily fixed. However, the string is still not valid as the identifiers need to be quoted as well. So when you have:

    [{idToFilter: 1, range:'2:2', idChildren: 4, description: 'blabla1'}, {idToFilter: 2, range:'3:5', idChildren: 3, description: 'blabla2'}, {idToFilter: 3, range:'2:6', idChildren: 2, description: 'blabla3'}]

    it really needs to be:

    [{"idToFilter": 1, "range":"2:2", "idChildren": 4, "description": "blabla1"}, {"idToFilter": 2, "range":"3:5", "idChildren": 3, "description": "blabla2"}, {"idToFilter": 3, "range":"2:6", "idChildren": 2, "description": "blabla3"}]

    You'll need to fix your input string (at least the quoting of the identifiers) for this to work.

    Then we need to convert the string to super and then unroll it. The SQL for this looks like:

    create table test as select '[{\'idToFilter\': 1, \'range\':\'2:2\', \'idChildren\': 4, \'description\': \'blabla1\'}, {\'idToFilter\': 2, \'range\':\'3:5\', \'idChildren\': 3, \'description\': \'blabla2\'},
    {\'idToFilter\': 3, \'range\':\'2:6\', \'idChildren\': 2, \'description\': \'blabla3\'}]' as p_filters; 
    
    SET enable_case_sensitive_identifier TO true;
    
    with fix_quotes as (
    select replace(p_filters,'\'','"') as p_filters from test
    ),
    json_value as (
    SELECT JSON_PARSE(p_filters) as json from fix_quotes
    )
    select i."idToFilter", i.range, i."idChildren", i.description 
    from json_value j, j.json i;
    

    The first CTE in this example is changing the quotes to double. (I manually changed the initial string to fix the missing quotes.) The second CTE converts the value to SUPER. The top select unrolls the super.

    Also note that you'll need to change to case sensitive identifiers in your Redshift session since you have upper case characters in your json.

    From here it is a matter of coding up the rest of the steps but let's see if this makes sense first. This really is the meat of the question. Other steps will be wrapping this query in a CTAS and putting it all in a stored proc with the passing of the input string. These steps are straight forward but I can help if more is needed.