Search code examples
sqlhiveapache-spark-sqlwindow-functions

How to Remove Overlap Using Window Function in Hive/Spark SQL


I have a table with customer and product information.

customer_id promotion_id   product_id  Rank
1               555            222      1
1               555            223      1 
1               555            999      1
1               556            154      2
1               556            222      2
1               556            278      2 
1               557            432      3
1               557            434      3
1               557            435      3

Some of the promotions overlap in product information. For example promotion_id 555 and 556 can be used on product 222. I would like to remove promotion_id 556 and rerank so that promotion 557 is ranked 2nd like so:

customer_id promotion_id   product_id  Rank
1               555            222      1
1               555            223      1 
1               555            999      1
1               557            432      2
1               557            434      2
1               557            435      2 

I've been messing around using the row_number()/rank() function in SQL but I can't seem to get it.


Solution

  • Using ROW_NUMBER:

    SELECT *, DENSE_RANK() OVER(ORDER BY promotion_id) AS new_rank
    FROM (SELECT *, DENSE_RANK() OVER(PARTITION BY CUSTOMER_ID, PRODUCT_ID 
                                      ORDER BY PRODUCT_ID, PROMOTION_ID) rn
          FROM tab) s
    WHERE rn = 1