Search code examples
arraysjsonsnowflake-cloud-data-platformflatten

Loading json file into snowflake with Kafka Connector


i'm loading a json file through kafka connector, so the connector is creating an internal stage, pipe and destination table with multiple VARIANT data type columns.

The json has a specific attributes that are created like this:

"data_1_attribute": [
        {
            "data_1_id": {
                "data_1": "1111",
                "data_2": 1
            },
            "data_2_id": null
        },
        ...
    ],
    "data_2_attribute": {
        "data_3_id": 111111,
        "data_4_id": "XXXXX",
        "data_5_id": [
            {
                "data_3": 111
            },
            {
                "data_4": 111
            }
        ],
        "data_6_id": [
            {
                "data_5": 222,
                "data_6": "YYY"
            }
        ],
        "data_7_id": [
            {
                "data_7": 333,
                "data_8": "ZZZ"
            },
            ...
        ]
    }

The main difference between data_1_attribute and data_2_attribute is that 1 is identified as a single object ("data_2_attribute": { ) and this is understood in a correct way by the kafka connector, but the first attribute ("data_1_attribute": [ ) i think is understood as an array of objects and when the VARIANT column is create, shows me the data like this

[
  "{\"data_1_id\":{\"data_2\":1,\"data_1\":"1111"},\"data_2_id\":null}"
]

Where if i do a lateral flatten the result in the VALUE column is:

"{\"data_1_id\":{\"data_2\":1,\"data_1\":"1111"},\"data_2_id\":null}"

And i don't know how to access and flatten this information.

I was trying to used an ARRAY_FLATTEN but i don't understand how to use it, any thoughts? Thanks


Solution

  • Try using PARSE_JSON for record_content column, this should fix your issue.

    SELECT PARSE_JSON(record_content) ....