Search code examples
hiveconditional-statementslagcumulative-sum

How to compute running sum if tag<>0 and reset to 0 if tag=0 in HIVE?


customer    txn_date    tag running_sum
A           1-Jan-17    1   1
A           2-Jan-17    1   2
A           3-Jan-17    1   3
A           4-Jan-17    1   4
A           5-Jan-17    1   5
A           6-Jan-17    1   6
A           7-Jan-17    0   0
A           8-Jan-17    1   1
A           9-Jan-17    1   2
A           10-Jan-17   1   3
A           11-Jan-17   0   0
A           12-Jan-17   0   0
A           13-Jan-17   1   1
A           14-Jan-17   1   2
A           15-Jan-17   0   0

How to get the running_sum and and reset the running_sum to zero if tag=0? Just like on the sample above. TIA


Solution

  • What you need to do is to create "groups" for each section of your 1s and 0s. You can do this by creating a boolean flag and then cumulatively summing over that column to get groups. From there you can cumulatively sum over your original tag column by each group you created in the sub-query.

    Query:

    SELECT customer
      , txn_date
      , tag
      , SUM(tag) OVER (PARTITION BY customer, flg_sum ORDER BY txn_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_sum
    FROM (
      SELECT *
        , SUM(tag_flg) OVER (PARTITION BY customer ORDER BY txn_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS flg_sum
      FROM (
        SELECT *
          , CASE WHEN  tag = 1 THEN 0 ELSE 1 END AS tag_flg
        FROM database.table ) x ) y
    

    Output:

    customer        txn_date        tag     running_sum
    A               2017-01-01      1       1
    A               2017-01-02      1       2
    A               2017-01-03      1       3
    A               2017-01-04      1       4
    A               2017-01-05      1       5
    A               2017-01-06      1       6
    A               2017-01-07      0       0
    A               2017-01-08      1       1
    A               2017-01-09      1       2
    A               2017-01-10      1       3
    A               2017-01-11      0       0
    A               2017-01-12      0       0
    A               2017-01-13      1       1
    A               2017-01-14      1       2
    A               2017-01-15      0       0