Search code examples
sqljsonsnowflake-cloud-data-platformvariant

How to flatten out a json list of strings in Snowflake


Hi I'm working in Snowflake with some relational data in a table that also includes a JSON column with a VARIANT data type format. I'm able to manipulate the JSON data and get most of the columns I want, but I'm having trouble with 1 that is a list (or array?) of data.

Below is what the VARIANT column looks like in its raw JSON format. This column is called product_attributes and the key I'm having problems with is web_categories.

{
  "data": {
    "availability": "available",
    "product_company": "macys",
    "id": "22345897290",
    "price": 5.99,
    "web_categories": [
      [
        "Beauty",
        "Makeup",
        "Eye Makeup",
        "Brows"
      ]
    ]
  }
}

I basically want to like group by that id attribute and then return a distinct count of the number of web categories for each individual product. In this case I'd expect that to return 4.

select product_attributes:data:id::string as id, count(distinct(t1.value)) as number_of_categories
from mytable, table(flatten(product_attributes:data:web_categories)) t1
group by 1;

But this query doesnt work, the value column that is returned by the table flatten command looks like ["Beauty", "Makeup", "Eye Makeup", "Brows"] so when it runs the count operation it only returns 1. Do I have to like split the string up by (",") and go that route? It's difficult because this isn't a normal attribute with elements broken up by {} which is what most examples look like, it's just using [].

It looks like there are 2 methods that are used to break down JSON data but I've tried using both and can't seem to get what I want.

lateral flatten(input => product_attributes:data:web_categories) table(flatten(product_attributes:data:web_categories)) t1

If anyone can help me out I'd appreciate it!


Solution

  • Here is one approach. You could do this in 1 flatten if size(web_categories)==1 always by doing web_catgories[0] in the first flatten

    with mytable as (select parse_json($1) product_attributes from values ('{
      "data": {
        "availability": "available",
        "product_company": "macys",
        "id": "22345897290",
        "price": 5.99,
        "web_categories": [
          [
            "Beauty",
            "Makeup",
            "Eye Makeup",
            "Brows"
          ]
        ]
      }}'))
      
    select product_attributes:data:id::string as id, count(distinct(t1.value)) as number_of_categories
    from mytable, lateral flatten(product_attributes:data:web_categories) t0, lateral flatten(t0.value) t1
    group by 1;
    
    ID  NUMBER_OF_CATEGORIES
    22345897290 4