Search code examples
snowflake-cloud-data-platformwindow-functionspartitioning

Snowflake query a partition with filter condition


I have a Key-value based records in the Snowflake table, where for a given product_id there are dozens of Key-value pairs records. See the example below:

enter image description here

with t1 (product_id, key, value) as 
    (
    select 101, 'grade',                'high' union all
    select 101, 'expense_cost',     'high' union all
    select 101, 'maintenance_cost', 'medium' union all
    select 102, 'grade',                'medium' union all
    select 102, 'expense_cost',     'high' union all
    select 103, 'expense_cost',     'high' union all
    select 103, 'maintenance_cost', 'medium' 
    )
    select * from t1;

Given this data model, the requirement is to fetch product_ids that match the filter criteria of key values.

Example 1: Fetch all product_id where key=(grade) is having value=(high or medium) & key=(expense_cost) has value=(high)

Example 2: Fetch all product_id where key=(grade) has value=(high) & key=(maintenance_cost) has value=(high or medium)

I am able to solve this requirement using a Snowflake PIVOT function, which first converts Key-Value data structure to Columnar data structure and then applies a Filter condition using the WHERE clause. Is there a better way to solve this problem without using PIVOTs, for example by using some Window functionality, etc.?

My PIVOT based solution for Example 1:

with t1 (product_id, key, value) as 
(
select 101, 'grade',                'high' union all
select 101, 'expense_cost',     'high' union all
select 101, 'maintenance_cost', 'medium' union all
select 102, 'grade',                'medium' union all
select 102, 'expense_cost',     'high' union all
select 103, 'expense_cost',     'high' union all
select 103, 'maintenance_cost', 'medium' 
)
select * from (
select product_id, key, value 
  from t1 
  where key in ('grade','expense_cost','maintenance_cost')
) pivot(min(value) for key in ('grade','expense_cost','maintenance_cost'))
as p (product_id, grade, expense_cost, maintenance_cost)
where grade in ('high','medium')
and expense_cost in ('high');

NOTE: IN case of using Window function, the output must only the contain the rows which qualify or pass the filter condition. The output must not contain all the records in the qualified partition.


Solution

  • Adding my solution below which is an improvement on @Lukasz Szozda's solution which had the following 2 problems:

    a. For each Attribute in the Filter criteria his code needed to add an additional Window function that increased the Snowflake computation time linearly.

    b. There were dozens of Attributes that needed to be removed from the resultset and adding so many Window functions for each of those Attributes would make the SQL query very bloated.

    Here is my improved code:

    with t1 (product_id, key, value) as 
    (
    select 101, 'grade',                'high' union all
    select 101, 'expense_cost',     'high' union all
    select 101, 'maintenance_cost', 'medium' union all
    select 102, 'grade',                'medium' union all
    select 102, 'expense_cost',     'high' union all
    select 103, 'expense_cost',     'high' union all
    select 103, 'maintenance_cost', 'medium' 
    )
    select * , 
     (CASE 
          WHEN key = 'grade' AND value IN ('medium', 'high') THEN TRUE
          WHEN key = 'expense_cost' AND value = 'high' THEN TRUE
          ELSE FALSE
      END) AS is_allowed
    from t1
    WHERE key IN ('grade', 'expense_cost')
    QUALIFY COUNT_IF(is_allowed=TRUE) OVER(PARTITION BY product_id) = 2;