Search code examples
amazon-redshiftflattenmaterialized-viewsjson-extract

extract nested fields from dynamodb json format in redshift/ Unmarshall DynamoDB JSON to regular JSON


My requirement is to create a Materialized view on redshift containing flattened data from nested Dynamodb JSON format. I have tried most of the ways provided by AWS as well as creating a function or stored procedure, but none of them are providing me with the correct result. My ask is can someone help to extract the data from below given format in columns using Materialized view on redshift Below is an example of dynamodb json format.

{ "Name": { "L": [ { "M": { "Field1":{ "s": "TRUE" }, "Field2":{ "s": "FALSE" }, "Field3":{ "s": "TRUE" } } } ] } }

The fields Field1, Field2, and Field3 need to be extracted in a column. PS: if the json format is wrong, please correct it

json_parse, json_extract_path_text, unpivot etc. i have tried most of the ways and nothing is working


Solution

  • I understand the confusion now. You need to use JSON_EXTRACT_PATH_TEXT() to parse named json keys but JSON_EXTRACT_ARRAY_ELEMENT_TEXT() to parse json arrays. You'll need to combine these to extract the information you desire (see below for the json_parse method).

    Your example should look like:

    select JSON_EXTRACT_PATH_TEXT(
        JSON_EXTRACT_ARRAY_ELEMENT_TEXT(
            JSON_EXTRACT_PATH_TEXT('{ "Name": { "L": [ { "M": { "Field1":{ "s": "TRUE" }, "Field2":{ "s": "FALSE" }, "Field3":{ "s": "TRUE" } } } ] } }', 'Name', 'L')
                , 0),
                    'M', 'Field1', 's')
        AS col_a;
    

    Because DynamoDB uses lots of nesting you need to parse the string a lot.

    Alternatively you can use Redshift's super data type to hold the values and the internal navigation capabilities to do what you intend. This has the advantage of being able to represent multiple values in the array element of the json (the above solution only gets the first element of the array). In the code below I have expanded your example to have 2 elements in the array "L" and this code will extract both values for Field1. For this code Redshift will make a row for every array element no matter how many. Also note that since your json keys have uppercase characters we will need to set Redshift to case sensitive to be able to parse these.

    SET enable_case_sensitive_identifier TO true;
    with jsub as (
        select JSON_PARSE('{ "Name": { "L": [ { "M": { "Field1":{ "s": "TRUE" }, "Field2":{ "s": "FALSE" }, "Field3":{ "s": "TRUE" } } },
            { "M": { "Field1":{ "s": "FALSE" }, "Field2":{ "s": "TRUE" }, "Field3":{ "s": "FALSE" } } } ] } }')::super as j
    ),
    jname as (
        select s.j , j."Name"."L" list
        from jsub s
    )
    select s.j, index, a."M"."Field1"."s"
    from jname s, s."list" a at index;