Search code examples
snowflake-cloud-data-platformdbt

How to parse json in snowflake to get the count based on certain field in json


[
  {
    "type": "ENTITLEMENT",
  },
  {
    "type": "ENTITLEMENT",
  },
  {
    "type": "ROLE"
  },
  {
    "type": "ACCESS_PROFILE"
  }
]

I have a column with the above json value in snowflake. I trying to get the count for a particular type. For example , I want the counts for type = ENTITLEMENT. This should be 2.

I could flatten this data. But this would produce a lot of duplicate data for columns ( i have a table with large number of columns).

Looking for ways to parse this json.

Things I have tried.

  1. Write a java udf to parse the the content of example
REATE OR REPLACE FUNCTION IDN_DATA.entitlement_counter(access array)
RETURNS NUMBER
    LANGUAGE java handler = 'JsonCounterWithFilter.entitlement_counter'
as
$$

public class JsonCounterWithFilter {

    public int entitlement_counter(String[] access) {
        int counter = 0;
        for(String acc :access) {
            if(acc.contains("ENTITLEMENT")) {
                counter++;
            }
        }
        return counter;
    }
}
$$;

this does not work as it would looks for the word entitlement anywhere, not just in the field type. I dont have access jackson library to parse this json.


Solution

  • A short JS UDF can do this count:

    create or replace function count_object_in_array(A array, T string, V string)
    returns string
    language javascript
    as
    $$
    return A.reduce((count, x) => count + (x[T] == V?1:0), 0)
    $$;
    

    For example:

    create or replace temp table stst as
    select parse_json('[
      {
        "type": "ENTITLEMENT",
      },
      {
        "type": "ENTITLEMENT",
      },
      {
        "type": "ROLE"
      },
      {
        "type": "ACCESS_PROFILE"
      }
    ]') a;
    
    
    select *, count_object_in_array(a, 'type', 'ENTITLEMENT')
    from stst;