Search code examples
sqlhivewindow-functionslag

Carry forward values in Hive


I have a requirement to carry forward values from Col_Value based on last Col_Date (based on partition by logic on the columns Col_1,Col_2,Col_3,Col_4). When there a value in the current row, we should not carry forward and use current value else carry forward from the previous date. If there is no value in the current row and for previous date, then we should use the last recent carry forwarded value.

Here's my input table.

enter image description here

Below is the sql I tried so far.

select *,
case 
    when col_value is null 
        then lag(Col_Value) over (PARTITION BY Col_1,Col_2,Col_3,Col_4 order by Col_Date) 
    else col_value
end as Carry_Fowrard_Value
from carry_table

I used the lag analytical function and was able to get below results based on the sql I shared above. But I am getting null for the highlighted rows as there are no values for the previous date. How do I carry forward the value based on last available value?

enter image description here

Below is the expected output.

enter image description here

Any help is greatly appreciated.


Solution

  • try the following - first get the count of NULL values and then fill those nulls it with max() value within that partition.

    Here is the demo in PostgreSQL though, but should work in hive as well.

    select
      Col_1, 
      Col_2, 
      Col_3, 
      Col_4, 
      Col_Date,
      Col_Value,
      coalesce(Col_Value, max(Col_Value) over (partition by Col_1, Col_2, Col_3, Col_4, rn)) as Carry_Forward_Value
    from
    (
      select
          *,
          count(Col_Value) over (partition by Col_1, Col_2, Col_3, Col_4 order by Col_Date) as rn
      from carry_table
    ) subq
    

    output:

    | col_1 | col_2 | col_3 | col_4 | col_date   | col_value | carry_forward_value |
    | ----- | ----- | ----- | ----- | ---------- | --------- | ------------------- |
    | ES    | A1    | X1    | Y1    | 2019-12-31 | 0         | 0                   |
    | ES    | A1    | X1    | Y1    | 2020-01-01 |           | 0                   |
    | ES    | A1    | X1    | Y1    | 2020-01-31 | 3         | 3                   |
    | ES    | A1    | X1    | Y1    | 2019-02-01 | 4         | 4                   |
    | ES    | A1    | X1    | Y1    | 2019-03-31 |           | 4                   |
    | ES    | A1    | X1    | Y1    | 2019-02-29 |           | 4                   |
    | ES    | A1    | X1    | Y1    | 2019-03-01 |           | 4                   |
    | ES    | A1    | X1    | Y1    | 2019-04-01 | 6         | 6                   |
    | ES    | A1    | X1    | Y1    | 2019-04-30 |           | 6                   |