Search code examples
sqlhive

Delete duplicates using dense rank


I have a sales data table with cust_ids and their transaction dates. I want to create a table that stores, for every customer, their cust_id, their last purchased date (on the basis of transaction dates) and the count of times they have purchased.

I wrote this code:

SELECT
    cust_xref_id, txn_ts,
    DENSE_RANK() OVER (PARTITION BY cust_xref_id ORDER BY CAST(txn_ts as timestamp) DESC) AS rank,
    COUNT(txn_ts)
FROM
    sales_data_table

But I understand that the above code would give an output like this (attached example picture)

enter image description here

How do I modify the code to get an output like :

enter image description here

I am a beginner in SQL queries and would really appreciate any help! :)


Solution

  • This would be an aggregation query which changes the table key from (customer_id, date) to (customer_id)

    SELECT
        cust_xref_id, 
        MAX(txn_ts) as last_purchase_date,
        COUNT(txn_ts) as count_purchase_dates
    FROM
        sales_data_table
    GROUP BY
        cust_xref_id