Search code examples
hadoophivewindow-functionspartition

Calculating Rolling Weekly Spend in Hive using Window Functions


I need to develop a distribution of customer week long spend. Every time a customer makes a purchase, I want to know how much they've spent with us in the past week. I would like to do this with my Hive code.

My data set is somewhat similar to this:

Spend_Table

Cust_ID | Purch_Date | Purch_Amount  
1 | 1/1/19 | $10  
1 | 1/2/19 | $21  
1 | 1/3/19 | $30  
1 | 1/4/19 | $11  
1 | 1/5/19 | $21  
1 | 1/6/19 | $31  
1 | 1/7/19 | $41  
2 | 1/1/19 | $12  
2 | 1/2/19 | $22  
2 | 1/3/19 | $32  
2 | 1/5/19 | $42  
2 | 1/7/19 | $52  
2 | 1/9/19 | $62  
2 | 1/11/19 | $72  

So far, I've tried code that looks similar to this:

Select Cust_ID, 
Purch_Date, 
Purch_Amount,
sum(Purch_Amount) over (partition by Cust_ID order by unix_timestamp(Purch_Date) range between 604800 and current row) as Rolling_Spend
from Spend_Table



Cust_ID | Purch_Date | Purch_Amount | Rolling_Spend  
1 | 1/1/19 | $10 | $10  
1 | 1/2/19 | $21 | $31  
1 | 1/3/19 | $30 | $61  
1 | 1/4/19 | $11 | $72  
1 | 1/5/19 | $21 | $93  
1 | 1/6/19 | $31 | $124  
1 | 1/7/19 | $41 | $165  
2 | 1/1/19 | $12 | $12  
2 | 1/2/19 | $22 | $34  
2 | 1/3/19 | $32 | $66  
2 | 1/5/19 | $42 | $108  
2 | 1/7/19 | $52 | $160  
2 | 1/9/19 | $62 | $188  
2 | 1/11/19 | $72 | $228  

I believe the issue is with my range between, because it appears to be grabbing the preceding number of rows. I was expecting it to grab data within the preceding amount of seconds (604800 being 6 days in seconds).

Is what I'm trying to do feasible? I can't do the previous 6 rows, since not every customer makes a purchase every single day, like customer 2. Any help is greatly appreciated!


Solution

  • SELECT *, sum(some_value) OVER (
            PARTITION BY Cust_ID 
            ORDER BY CAST(Purch_Date AS timestamp) 
            RANGE BETWEEN INTERVAL 7 DAYS PRECEDING AND CURRENT ROW
         ) AS cummulativeSum FROM Spend_Table
    

    https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics