Search code examples
sqldb2comparedb2-luwsql-null

How can I test whether all of the rows in a table are duplicated (except for one column)


I am working with a datawarehouse table that has can be split into claimed rows, and computed rows. I suspect that the computed rows are perfect duplicates of the claimed row (with the exception of the claimed/computed column).

I tried to test this using the except clause:

But all of the records were returned. I don't believe that this is possible, and I suspect it's due to null values.

Is there a way to compare the records which will compare nulls to nulls?

SELECT a, b, c FROM table WHERE clm_cmp_cd = 'clm'
    EXCEPT
SELECT a, b, c FROM table WHERE clm_cmp_cd = 'cmp'

But all of the records were returned. I don't believe that this is possible, and I suspect it's due to null values.

Is there a way to compare the records which will compare nulls to nulls?

edit: the solution should work with an arbitrary number of fields, with varying types. In this case, I have ~100 fields, 2/3 of which may have null values. This is a data warehouse, and some degree of denormalization must be expected.

edit: I tested the query while limiting myself to non-null columns, and I got the result I expected (nothing). But, I would still like to compare fields which potentially contain null values.


Solution

  • Your supposition would appear to be false. You might try this:

    select a, b, c,
           sum(case when clm_cmp_cd = 'clm' then 1 else 0 end) as num_clm,
           sum(case when clm_cmp_cd = 'cmp' then 1 else 0 end) as num_cmp
    from t
    group by a, b, c;
    

    This will show you the values of the three columns and the number of matches of each type.

    Your problem is probably that values that look alike are not exactly the same. This could be due to slight differences in floating point number or due to unmatched characters in the string, such as leading spaces.