Search code examples
sqlduplicateshierarchy

SQL - Manage duplicates using hierarchy


Consider the following table.

Customer Category Key
Ajax VIP 1A
Zeus Retail 2B
Hera Retail 3C
Ajax Retail 1A

Notice the duplicate 1A value, which is both VIP and Retail.

How to manage these duplicates using a hierarchy, where if VIP, then keep VIP and remove retail ?

End result should be

Customer Category Key
Ajax VIP 1A
Zeus Retail 2B
Hera Retail 3C

Thanks

I've tried assigning values as such:

VIP = 100 Retail = 1

Then group by Key and summing the new column. If results are within a specific range (>100) then the customer gets assigned VIP, otherwise Retail. In the end, original Category column is removed, replaced by Computed_Category.

Looking for a more elegant method.

Edit:

There are 17 categories, with each superseding those below in rank.

Example here

SELECT *, ROW_NUMBER() OVER (PARTITION BY CUSTNUMBER ORDER BY CASE WHEN CLASS = 'X' THEN 1 WHEN CLASS = 'Y' THEN 2 ELSE 3 END, UNIQUEID) RN FROM (VALUES (100,'AA','Z') , (200,'BB','X') , (300,'CC','X') , (400,'DD','Y') , (100,'AA','Y') , (100,'AA','X') ) t1 (CUSTNUMBER, UNIQUEID, CLASS)

ORDER BY UNIQUEID, CLASS

End result should remove 100-AA-Y and 100-AA-Z.

If a category is not found, use the next one in the hierarchy.


Solution

  • You could use row_number function with order by case expression as the following:

    Select Customer, Category, Key_
    From
    (
      Select *,
        ROW_NUMBER() Over (Partition By Customer Order By 
                             Case When Category = 'VIP' Then 1 
                                  When Category = 'Corporate' Then 2
                                  When Category = 'Retail' Then 3
                                  -- continue for all categoreis 
                                  Else 17 
                              End, key_) rn
      From table_name
    ) T
    Where rn = 1
    

    If you could have a separate table that holds each category and it's priority, you may simplify the query as the following:

    Create table Categories(Category VARCHAR(50), priority INT);
    Insert Into Categories Values 
      ('VIP', 1),
      ('Corporate', 2),
      ('Retail', 3); -- list all categories
    

    And the query:

    Select Customer, Category, Key_
    From
    (
      Select T.Customer, T.Category, T.Key_,
        ROW_NUMBER() Over (Partition By T.Customer Order By C.priority, T.key_) rn
      From table_name T LEFT JOIN Categories C
      ON T.Category = C.Category
    ) T
    Where rn = 1
    

    See demo.