Search code examples
sqloracle-databasewindowlistagg

SQL: How to find disparate values in columns and show their count?


I have the following Oracle SQL table, where product_id 101 and 103 have disparate values in columns attr1 and attr3 respectively.

data

| PRODUCT_ID | ATTR1 | ATTR2 | ATTR3 |
|------------|-------|-------|-------|
| 101        | a     | x     | z     |
| 101        | a     | x     | zzz   |
| 101        | aa    | x     | z     |
| 102        | b     | y     | z     |
| 102        | b     | y     | z     |
| 103        | c     | z     | z     |
| 103        | c     | z     | zz    |

I want to get the following output which lists the disparate values and their count in the columns.

output

| PRODUCT_ID | DESCR            | VALUE_COUNT  |
|------------|------------------|--------------|
| 101        | Issue with attr1 | a(2), aa(1)  |
| 101        | Issue with attr3 | z(2), zzz(1) |
| 103        | Issue with attr3 | z(1), zz(1)  |

I wrote a query to get the result for just one column, but it will require quite an effort to write it for the actual data where I will need to check 20+ columns for disparate values. Any suggestions to make it more efficient?

query

WITH data AS (
   
   SELECT 101 product_id, 'a'    attr1, 'x'  attr2, 'z'     attr3    FROM dual UNION ALL
   SELECT 101 product_id, 'a'    attr1, 'x'  attr2, 'zzz'   attr3    FROM dual UNION ALL
   SELECT 101 product_id, 'aa'   attr1, 'x'  attr2, 'z'     attr3    FROM dual UNION ALL
   SELECT 102 product_id, 'b'    attr1, 'y'  attr2, 'z'     attr3    FROM dual UNION ALL
   SELECT 102 product_id, 'b'    attr1, 'y'  attr2, 'z'     attr3    FROM dual UNION ALL
   SELECT 103 product_id, 'c'    attr1, 'z'  attr2, 'z'     attr3    FROM dual UNION ALL
   SELECT 103 product_id, 'c'    attr1, 'z'  attr2, 'zz'    attr3    FROM dual
   
), d1 AS (

   SELECT product_id, 'Issue with attr1' descr 
   FROM data
   GROUP BY product_id 
   HAVING COUNT(DISTINCT attr1) > 1

), d2 AS (
      
   SELECT DISTINCT d1.product_id, d1.descr, data.attr1, COUNT(attr1) OVER (PARTITION BY attr1) cnt
   FROM d1
   INNER JOIN data
      ON d1.product_id = data.product_id
   
)

SELECT product_id, descr, LISTAGG(attr1 || '(' || cnt || ')', ', ') WITHIN GROUP (ORDER BY product_id) value_count
FROM d2
GROUP BY product_id, descr
;

Solution

  • You may unpivot all the attributes into separate rows, compute rowcount per attribute and value and compare it with the rowcount per product_id. Then aggregate errors back with listagg.

    This will require only to add more columns to inpivot ... for ....

    with prep as (
      select
        sample.*
        /*Rowcount per product_id*/
        , count(1) over(partition by product_id) as rowcnt
      from sample
    )
    , unp as (
      select
        product_id,
        /*Classify the issue*/
        case
          /*Count per value*/
          when count(1) != rowcnt
          then 'Issue with ' || col
        end as issue,
        /*Count per value*/
        coalesce(val, 'NULL') || '(' || count(1) || ')' as cnt
      from prep
      unpivot include nulls (
        val for col in (attr1, attr2, attr3)
      )
      group by
        product_id,
        col,
        val,
        rowcnt
    )
    /*Aggregate all*/
    select
      product_id,
      issue,
      listagg(cnt, ', ') as value_count
    from unp
    where issue is not null
    group by
      product_id,
      issue
    order by 1, 2
    
    PRODUCT_ID ISSUE VALUE_COUNT
    101 Issue with ATTR1 a(2), aa(1)
    101 Issue with ATTR3 z(2), zzz(1)
    103 Issue with ATTR3 z(1), zz(1)

    fiddle

    UPD: By default unpivot removes nullvalues from output. To include them you need to addinclude nulls`.

    If you add one more row eith null to your sample data:

    insert into sample (PRODUCT_ID, ATTR1, ATTR2, ATTR3)
    values (103, 'c', 'z', null)
    

    the above (modified) query will return:

    PRODUCT_ID ISSUE VALUE_COUNT
    101 Issue with ATTR1 a(2), aa(1)
    101 Issue with ATTR3 z(2), zzz(1)
    103 Issue with ATTR3 z(1), zz(1), NULL(1)

    fiddle