Search code examples
snowflake-cloud-data-platformpivotaggregate

Filtering a pivot to only include columns with data [Snowflake]


Somewhat of an unusual request. We have a software that asks survey questions. Any survey can include any number of questions (From 1 to 30 easily, but there is no limit).

The request from the user is to pull the data out with a line for each survey, response_ID and the questions pivoted. I have accomplished this so the data is in the format of:

Survey ID Response_ID Question1 Question2... QuestionK Q1 Q2
ABC 1 1 Null Null NULL NULL
ABC 2 Null 4 NULL NULL NULL
DEF 1 Null Null NULL 10 NULL
GHI 1 10 NULL NULL
GHI 2 NULL 5 NULL

Where Survey ABC has 2 questions scored from 1-10 Survey DEF has 1 question but the question was non-standard so it show up in column Q1 Survey GHI has 2 questions scored from 1-10

If we select survey DEF in our BI tool, we don't want to show columns QUestion1-QuestionK, we just want the columns Q1 and Q2

The way I've thought to do this is to Group by the Survey_ID and Response_ID, but we don't know how many columns there are and so we can't aggregate the rest of the columns

To filter by survey ID and then make a temporary table on just those results and exclude Survey_ID, Response_ID and coalesce(*) as all cols and then only show olumns with data, but this seems to return all columns as I think I need to aggregate first and then have only the columns with data

The desired Output:in the BI tool Filter on SUrvey ID: DEF

Survey Responses Q1
DEF 1 10

Filter on Survey ID: ABC

Survey ID Responses Question1 Question 2
ABC 2 1 4

Filter: GHI

Survey ID Response Question1 Question 2
GHI 2 10 5

Any help on the aggregation, removing columns that are null, etc. would be appreciated.

select survey_ID, Response_ID, max(\*)
from surveypivot
Where survey_ID = 'DEF'
group by Survey_ID,Response_ID

Solution

  • Disclaimer: It is a strange request to generate column list dynamically based on filter and personally I would not recommend such code to be used.

    With that being said, Snowflake SQL is expressive enough to generate such output:

    WITH cte AS (
      SELECT *, OBJECT_CONSTRUCT(* EXCLUDE(SURVEY_ID, RESPONSE_ID)) AS obj
      FROM tab
      WHERE Survey_ID = 'GHI'
    ), cte_unpivot AS (
      SELECT SURVEY_ID, COUNT(*) OVER(PARTITION BY SURVEY_ID) AS RESPONSES, 
             KEY, MAX(VALUE)::NUMBER AS VALUE
      FROM cte
      ,TABLE(FLATTEN(obj))
      GROUP BY SURVEY_ID, KEY
    )
    SELECT *
    FROM cte_unpivot
    PIVOT (MAX(VALUE) FOR KEY IN (ANY));
    

    For sample data:

    CREATE OR REPLACE TABLE tab(Survey_ID TEXT, Response_ID INT, Question1 INT,
                                Question2 INT, QuestionK INT,Q1 INT, Q2 INT)
    AS    SELECT 'ABC',1 , 1,Null,Null, NULL,NULL
    UNION SELECT 'ABC', 2,  Null,4,NULL,NULL,NULL
    UNION SELECT 'DEF', 1,  Null ,Null, NULL,10 ,NULL
    UNION SELECT 'GHI' ,1,  10, NULL, NULL, NULL, NULL
    UNION SELECT 'GHI', 2, NULL, 5, NULL, NULL, NULL;
    

    Output:

    enter image description here

    Related: