I have a Key-value based records in the Snowflake table, where for a given product_id there are dozens of Key-value pairs records. See the example below:
with t1 (product_id, key, value) as
(
select 101, 'grade', 'high' union all
select 101, 'expense_cost', 'high' union all
select 101, 'maintenance_cost', 'medium' union all
select 102, 'grade', 'medium' union all
select 102, 'expense_cost', 'high' union all
select 103, 'expense_cost', 'high' union all
select 103, 'maintenance_cost', 'medium'
)
select * from t1;
Given this data model, the requirement is to fetch product_ids that match the filter criteria of key values.
Example 1: Fetch all product_id where key=(grade) is having value=(high or medium) & key=(expense_cost) has value=(high)
Example 2: Fetch all product_id where key=(grade) has value=(high) & key=(maintenance_cost) has value=(high or medium)
I am able to solve this requirement using a Snowflake PIVOT
function, which first converts Key-Value data structure to Columnar data structure and then applies a Filter
condition using the WHERE
clause. Is there a better way to solve this problem without using PIVOTs, for example by using some Window functionality, etc.?
My PIVOT based solution for Example 1:
with t1 (product_id, key, value) as
(
select 101, 'grade', 'high' union all
select 101, 'expense_cost', 'high' union all
select 101, 'maintenance_cost', 'medium' union all
select 102, 'grade', 'medium' union all
select 102, 'expense_cost', 'high' union all
select 103, 'expense_cost', 'high' union all
select 103, 'maintenance_cost', 'medium'
)
select * from (
select product_id, key, value
from t1
where key in ('grade','expense_cost','maintenance_cost')
) pivot(min(value) for key in ('grade','expense_cost','maintenance_cost'))
as p (product_id, grade, expense_cost, maintenance_cost)
where grade in ('high','medium')
and expense_cost in ('high');
NOTE: IN case of using Window function, the output must only the contain the rows which qualify or pass the filter condition. The output must not contain all the records in the qualified partition.
Adding my solution below which is an improvement on @Lukasz Szozda's solution which had the following 2 problems:
a. For each Attribute in the Filter criteria his code needed to add an additional Window function that increased the Snowflake computation time linearly.
b. There were dozens of Attributes that needed to be removed from the resultset and adding so many Window functions for each of those Attributes would make the SQL query very bloated.
Here is my improved code:
with t1 (product_id, key, value) as
(
select 101, 'grade', 'high' union all
select 101, 'expense_cost', 'high' union all
select 101, 'maintenance_cost', 'medium' union all
select 102, 'grade', 'medium' union all
select 102, 'expense_cost', 'high' union all
select 103, 'expense_cost', 'high' union all
select 103, 'maintenance_cost', 'medium'
)
select * ,
(CASE
WHEN key = 'grade' AND value IN ('medium', 'high') THEN TRUE
WHEN key = 'expense_cost' AND value = 'high' THEN TRUE
ELSE FALSE
END) AS is_allowed
from t1
WHERE key IN ('grade', 'expense_cost')
QUALIFY COUNT_IF(is_allowed=TRUE) OVER(PARTITION BY product_id) = 2;