I have a requirement to carry forward values from Col_Value
based on last Col_Date
(based on partition by logic on the columns Col_1,Col_2,Col_3,Col_4
). When there a value in the current row, we should not carry forward and use current value else carry forward from the previous date. If there is no value in the current row and for previous date, then we should use the last recent carry forwarded value.
Here's my input table.
Below is the sql I tried so far.
select *,
case
when col_value is null
then lag(Col_Value) over (PARTITION BY Col_1,Col_2,Col_3,Col_4 order by Col_Date)
else col_value
end as Carry_Fowrard_Value
from carry_table
I used the lag
analytical function and was able to get below results based on the sql I shared above. But I am getting null for the highlighted rows as there are no values for the previous date. How do I carry forward the value based on last available value?
Below is the expected output.
Any help is greatly appreciated.
try the following - first get the count of NULL
values and then fill those nulls it with max()
value within that partition.
Here is the demo in PostgreSQL though, but should work in hive as well.
select
Col_1,
Col_2,
Col_3,
Col_4,
Col_Date,
Col_Value,
coalesce(Col_Value, max(Col_Value) over (partition by Col_1, Col_2, Col_3, Col_4, rn)) as Carry_Forward_Value
from
(
select
*,
count(Col_Value) over (partition by Col_1, Col_2, Col_3, Col_4 order by Col_Date) as rn
from carry_table
) subq
output:
| col_1 | col_2 | col_3 | col_4 | col_date | col_value | carry_forward_value |
| ----- | ----- | ----- | ----- | ---------- | --------- | ------------------- |
| ES | A1 | X1 | Y1 | 2019-12-31 | 0 | 0 |
| ES | A1 | X1 | Y1 | 2020-01-01 | | 0 |
| ES | A1 | X1 | Y1 | 2020-01-31 | 3 | 3 |
| ES | A1 | X1 | Y1 | 2019-02-01 | 4 | 4 |
| ES | A1 | X1 | Y1 | 2019-03-31 | | 4 |
| ES | A1 | X1 | Y1 | 2019-02-29 | | 4 |
| ES | A1 | X1 | Y1 | 2019-03-01 | | 4 |
| ES | A1 | X1 | Y1 | 2019-04-01 | 6 | 6 |
| ES | A1 | X1 | Y1 | 2019-04-30 | | 6 |