Search code examples
sqlgroup-bysubquerysnowflake-cloud-data-platform

Exploding an object column, concatenating the results and adding them to the original rows


I have a Snowflake table similar to this:

CREATE OR REPLACE VIEW EXAMPLE AS 
SELECT $1 AS ID_EXAMPLE, 
       PARSE_JSON($2) AS SOME_JSON
  FROM (VALUES ('a','[1,2]'),
               ('b','[3,4]'));

I want to create a derived view with the same structure, but with an extra column that is the string concatenation of the values inside the JSON column.

This attempt at query for the new view doesn't work:

SELECT E.*,
       EXTRA.VALUE AS EXTRA_COL
 FROM EXAMPLE E
 CROSS JOIN LATERAL 
   (SELECT LISTAGG(VALUE,'/') AS VALUE 
      FROM TABLE(FLATTEN(INPUT => E.SOME_JSON))) EXTRA;             

It fails with "SQL compilation error: Unsupported subquery type cannot be evaluated".

This query using GROUP BY does work:

SELECT E.*,        
       LISTAGG(EXTRA.VALUE,'/') AS EXTRA_COL
FROM EXAMPLE E
CROSS JOIN LATERAL FLATTEN(INPUT => E.SOME_JSON) EXTRA
GROUP BY E.ID_EXAMPLE, E.SOME_JSON;

The problem is that my real table has a huge number of columns, so the GROUP BY will be cumbersome to write and maintain. Is there some way of formulating the query that doesn't involve writing the complete GROUP BY clause?


Solution

  • So moving the create to a CTE to just prove the error still happens in this form:

    WITH the_view AS (
      SELECT $1 AS ID_EXAMPLE, 
           PARSE_JSON($2) AS SOME_JSON
      FROM (VALUES ('a','[1,2]'),
                   ('b','[3,4]'))
    )
    SELECT E.*,
           EXTRA.VALUE AS EXTRA_COL
     FROM the_view E
     CROSS JOIN LATERAL 
       (SELECT LISTAGG(VALUE,'/') AS VALUE 
          FROM TABLE(FLATTEN(INPUT => E.SOME_JSON))) EXTRA;  
    

    and it does, if we move the sub-select to it's own level CTE we can join on the id_column

    WITH example AS (
      SELECT $1 AS ID_EXAMPLE, 
           PARSE_JSON($2) AS SOME_JSON
      FROM VALUES ('a','[1,2]'),
                  ('b','[3,4]')
    ), sub_process AS (
      SELECT E.ID_EXAMPLE,        
         LISTAGG(EXTRA.VALUE,'/') AS EXTRA_COL
      FROM EXAMPLE E,
      TABLE(FLATTEN(INPUT => E.SOME_JSON)) EXTRA
      GROUP BY E.ID_EXAMPLE
    )
    SELECT e.*,
           s.EXTRA_COL
     FROM example AS e
     JOIN sub_process s ON E.ID_EXAMPLE = s.ID_EXAMPLE;  
    

    which gives:

    ID_EXAMPLE  SOME_JSON   EXTRA_COL
    a   [    1,    2  ]     1/2
    b   [    3,    4  ]     3/4
    

    that seems to solve the problem.