I have spent days trying to figure out how to add missing dates with the nearest values in Hive with no luck. I need to use Hive SQL for this based on environment constraints. The raw table currently looks like the below table.
account name,available balance,Date of balance
Peter,50000,2021-05-24
Peter,50035,2021-05-25
Peter,50035,2021-05-26
Peter,50610,2021-05-28
Peter,51710,2021-06-01
Peter,53028.1,2021-06-02
Peter,53916.1,2021-06-03
Mary,50000,2021-05-24
Mary,50035,2021-05-25
Mary,53028.1,2021-05-30
What I need is to convert the table above to the table in below link:
account name,available balance,Date of balance
Peter,50000,2021-05-24
Peter,50035,2021-05-25
Peter,50035,2021-05-26
Peter,50035,2021-05-27
Peter,50610,2021-05-28
Peter,50610,2021-05-29
Peter,50610,2021-05-30
Mary,50000,2021-05-24
Mary,50035,2021-05-25
Mary,50035,2021-05-26
Mary,50035,2021-05-27
Mary,50035,2021-05-28
Mary,50035,2021-05-29
Mary,53028.1,2021-05-30
Can anyone please share the Hive SQL logic to make this change please?
Get next date using lead() function, calculate difference in days, get string of spaces with length = diff in days, split, use posexplode to generate rows, use position to add to date to get missing dates:
with mytable as (--Demo dataset, use your table instead of this
select stack(10, --number of tuples
'Peter',float(50000),'2021-05-24',
'Peter',float(50035),'2021-05-25',
'Peter',float(50035),'2021-05-26',
'Peter',float(50610),'2021-05-28',
'Peter',float(51710),'2021-06-01',
'Peter',float(53028.1),'2021-06-02',
'Peter',float(53916.1),'2021-06-03',
'Mary',float(50000),'2021-05-24',
'Mary',float(50035),'2021-05-25',
'Mary',float(53028.1),'2021-05-30'
) as (account_name,available_balance,Date_of_balance)
) --use your table instead of this CTE
select account_name, available_balance, date_add(Date_of_balance,e.i) as Date_of_balance
from
( --Get next_date to generate date range
select account_name,available_balance,Date_of_balance,
lead(Date_of_balance,1, Date_of_balance) over (partition by account_name order by Date_of_balance) next_date
from mytable d --use your table
) s lateral view outer posexplode(split(space(datediff(next_date,Date_of_balance)-1),'')) e as i,x --generate rows
order by account_name desc, Date_of_balance --this is to have order of rows like in your Converted Table
Result:
account_name available_balance date_of_balance
Peter 50000 2021-05-24
Peter 50035 2021-05-25
Peter 50035 2021-05-26
Peter 50035 2021-05-27
Peter 50610 2021-05-28
Peter 50610 2021-05-29
Peter 50610 2021-05-30
Peter 50610 2021-05-31
Peter 51710 2021-06-01
Peter 53028.1 2021-06-02
Peter 53916.1 2021-06-03
Mary 50000 2021-05-24
Mary 50035 2021-05-25
Mary 50035 2021-05-26
Mary 50035 2021-05-27
Mary 50035 2021-05-28
Mary 50035 2021-05-29
Mary 53028.1 2021-05-30