Search code examples
amazon-athenaamazon-cloudtrail

Athena Unnesting mixed types


I normally unnest like so :

SELECT h.field1, rp.p.key1, rp.p.key2
FROM
  mytable h
  CROSS JOIN UNNEST(h.field2) rp (p)

However, I am now looking at querying AWS CloudTrail data. Here the field that I want to unnest varies...

Sometimes it's a single object:

{"principal":{"dataLakePrincipalIdentifier":"arn:aws:iam::......

Other times it contains a wrapper object around an array of objects

{"entries":[{"id":"0","principal":{"dataLakePrincipalIdentifier":"arn:aws:iam::.........

The only way that I can currently think of to tackle this, is to create 2 separate queries; one to unpack the singletons, another to unpack the array entries - and then union the two results together.

I'd be grateful if anyone knows of a more efficient approach ?


Solution

  • You should be able to achieve this by checking on the existence of a specific key inside the object, and do the following:

    1. If the key is absent, this is definitely a single object of type (map(varchar, json). Convert it to look like a map(varchar, array(json).
    2. If the key is present, leave it as is.

    Let's say, we check on the existence of key entries:

    with data as (
      select CAST(json_parse(your_json_string) AS MAP(VARCHAR, json )) as p
      from mydataset.mytable
    )
    
    select cast(json_extract(j, '$.principle') as map(varchar, integer)) as record from (
         select 
         if(
           cardinality(filter(map_keys(p), x -> x = 'entries')) = 1, 
           p,
           MAP(ARRAY['entries'], array[cast(array[p] as json)])
         ) as x from data2
    ), unnest(cast(x['entries'] as array(json))) as z(j)
    

    Please note this part, specifically:

    if(
      cardinality(filter(map_keys(p), x -> x = 'entries')) = 1, 
      p,
      MAP(ARRAY['entries'], array[cast(array[p] as json)])
    )
    

    It does the job of converting a single object to look like a regular map(varchar, array(json)) which makes everything look alike.