Search code examples
sqljsonazure-databricks

JSON column expand databricks sql


I have the JSON column which and would like to expand it using databricks sql (no python).

column name is cdi_sentemailstatisticsdata, and each json value has its corresponding id too and sample data:

{"list":[{"Count":91,"EmailClient":"Other","Platform":"Desktop"},{"Count":20,"EmailClient":"Hotmail","Platform":"Desktop"},{"Count":5,"EmailClient":"Chrome Mobile","Platform":"Desktop"},{"Count":2,"EmailClient":"Opera","Platform":"Desktop"},{"Count":35,"EmailClient":"Chrome","Platform":"Desktop"},{"Count":21,"EmailClient":"Outlook","Platform":"Desktop"},{"Count":26,"EmailClient":"Gmail","Platform":"Desktop"},{"Count":1,"EmailClient":"Safari","Platform":"Desktop"},{"Count":1,"EmailClient":"Icloud","Platform":"Desktop"},{"Count":1,"EmailClient":"Thunderbird","Platform":"Desktop"},{"Count":1,"EmailClient":"Outlook.com","Platform":"Desktop"},{"Count":16,"EmailClient":"Chrome Mobile","Platform":"Mobile"},{"Count":22,"EmailClient":"Mobile Safari","Platform":"Mobile"},{"Count":8,"EmailClient":"Hotmail","Platform":"Mobile"},{"Count":6,"EmailClient":"Yahoo","Platform":"Mobile"},{"Count":4,"EmailClient":"Gmail","Platform":"Mobile"},{"Count":1,"EmailClient":"Chrome","Platform":"Mobile"}],"lastUpdate":"2020-11-11T13:31:14.3587369+00:00"}

Desired output should look like this:

enter image description here

Tried something like this but did not work:

SELECT
  json_element.Count AS Count,
  json_element.EmailClient AS EmailClient,
  json_element.Platform AS Platform
FROM
  cdi_sentemailstatisticsdata
LATERAL VIEW
  explode(from_json(json_column.list, 'array<struct<Count:int,EmailClient:string,Platform:string>>')) AS json_element;

Solution

  • You can use below SQL query.

    SELECT
        json_element.Count  AS  Count,
        json_element.EmailClient AS EmailClient,
         json_element.Platform AS Platform
    FROM
    (
    SELECT cdi_sentemailstatisticsdata as json_column from your_table_name
    ) t
    LATERAL  VIEW
    explode(from_json(json_column, 'struct<list:array<struct<Count:int,EmailClient:string,Platform:string>>,lastUpdate:string>').list) AS json_element;
    

    Output:

    enter image description here