Search code examples
sqlapache-sparkapache-iceberg

Extract from List of JSON


I have a string field: [{"et": "AS","ct":"MC"},{"et": "AT","ct":"TC"}, {"et": "AQ","ct":"EC"}]

I want to get the "ct" column values to be combined together as part of new column something like MC_TC_EC

The table is Iceberg Table. I have looked into https://spark.apache.org/docs/latest/api/sql/index.html#from_json to figure out how to extract the values from each json of the list but all in vain. Table name: custTable String field: custDetail New Field: custInitials


Solution

  • The following is untested but might do what you have asked for:

    SELECT 
      *, 
      array_join(
        transform(
          parsed, 
          x -> x.ct
        ), 
        '_'
      ) as custInitials 
    FROM 
      (
        SELECT 
          *, 
          from_json(custDetail, 'ARRAY<STRUCT<et: STRING, ct: STRING>>') as parsed 
        FROM 
          custTable
      ) t