I have the following Oracle SQL table, where product_id
101 and 103 have disparate values in columns attr1
and attr3
respectively.
data
| PRODUCT_ID | ATTR1 | ATTR2 | ATTR3 |
|------------|-------|-------|-------|
| 101 | a | x | z |
| 101 | a | x | zzz |
| 101 | aa | x | z |
| 102 | b | y | z |
| 102 | b | y | z |
| 103 | c | z | z |
| 103 | c | z | zz |
I want to get the following output which lists the disparate values and their count in the columns.
output
| PRODUCT_ID | DESCR | VALUE_COUNT |
|------------|------------------|--------------|
| 101 | Issue with attr1 | a(2), aa(1) |
| 101 | Issue with attr3 | z(2), zzz(1) |
| 103 | Issue with attr3 | z(1), zz(1) |
I wrote a query to get the result for just one column, but it will require quite an effort to write it for the actual data where I will need to check 20+ columns for disparate values. Any suggestions to make it more efficient?
query
WITH data AS (
SELECT 101 product_id, 'a' attr1, 'x' attr2, 'z' attr3 FROM dual UNION ALL
SELECT 101 product_id, 'a' attr1, 'x' attr2, 'zzz' attr3 FROM dual UNION ALL
SELECT 101 product_id, 'aa' attr1, 'x' attr2, 'z' attr3 FROM dual UNION ALL
SELECT 102 product_id, 'b' attr1, 'y' attr2, 'z' attr3 FROM dual UNION ALL
SELECT 102 product_id, 'b' attr1, 'y' attr2, 'z' attr3 FROM dual UNION ALL
SELECT 103 product_id, 'c' attr1, 'z' attr2, 'z' attr3 FROM dual UNION ALL
SELECT 103 product_id, 'c' attr1, 'z' attr2, 'zz' attr3 FROM dual
), d1 AS (
SELECT product_id, 'Issue with attr1' descr
FROM data
GROUP BY product_id
HAVING COUNT(DISTINCT attr1) > 1
), d2 AS (
SELECT DISTINCT d1.product_id, d1.descr, data.attr1, COUNT(attr1) OVER (PARTITION BY attr1) cnt
FROM d1
INNER JOIN data
ON d1.product_id = data.product_id
)
SELECT product_id, descr, LISTAGG(attr1 || '(' || cnt || ')', ', ') WITHIN GROUP (ORDER BY product_id) value_count
FROM d2
GROUP BY product_id, descr
;
You may unpivot all the attributes into separate rows, compute rowcount per attribute and value and compare it with the rowcount per product_id
. Then aggregate errors back with listagg
.
This will require only to add more columns to inpivot ... for ...
.
with prep as (
select
sample.*
/*Rowcount per product_id*/
, count(1) over(partition by product_id) as rowcnt
from sample
)
, unp as (
select
product_id,
/*Classify the issue*/
case
/*Count per value*/
when count(1) != rowcnt
then 'Issue with ' || col
end as issue,
/*Count per value*/
coalesce(val, 'NULL') || '(' || count(1) || ')' as cnt
from prep
unpivot include nulls (
val for col in (attr1, attr2, attr3)
)
group by
product_id,
col,
val,
rowcnt
)
/*Aggregate all*/
select
product_id,
issue,
listagg(cnt, ', ') as value_count
from unp
where issue is not null
group by
product_id,
issue
order by 1, 2
PRODUCT_ID | ISSUE | VALUE_COUNT |
---|---|---|
101 | Issue with ATTR1 | a(2), aa(1) |
101 | Issue with ATTR3 | z(2), zzz(1) |
103 | Issue with ATTR3 | z(1), zz(1) |
UPD: By default unpivot removes
nullvalues from output. To include them you need to add
include nulls`.
If you add one more row eith null
to your sample data:
insert into sample (PRODUCT_ID, ATTR1, ATTR2, ATTR3)
values (103, 'c', 'z', null)
the above (modified) query will return:
PRODUCT_ID | ISSUE | VALUE_COUNT |
---|---|---|
101 | Issue with ATTR1 | a(2), aa(1) |
101 | Issue with ATTR3 | z(2), zzz(1) |
103 | Issue with ATTR3 | z(1), zz(1), NULL(1) |