I have a table structured as the following:
DateDay Id Value
20200801 A 7
20200803 A 1
20200803 B 3
20200804 C 30
... ... ...
Where DateDay is present only if Value > 0 And I have several possible Ids
What I would like to do is obtain a table with a rolling sum of Value over a period of 7 days, per each Id. But it should have all the days in it
DateDay Id Value RollingSum
20200801 A 7 7
20200801 B 0 0
20200801 C 0 0
20200802 A 0 7
20200802 B 0 0
20200802 C 0 0
20200803 A 1 8
20200803 B 3 3
20200803 C 0 0
20200804 A 0 8
20200804 B 0 3
20200804 C 30 30
... ... ... ...
Is there a simple way to accomplish this via SQL (I'm using Impala/Hive)?
You have two problems -- generating the rows and the cumulative sums. Let me assume that the table has all the time periods you want or you have another table with them. Then use a cross join
to generate the rows and a cumulative sum for the results:
select d.dateday, i.id,
sum(t.value) over (partition by i.id order by d.dateday) as running_sum
from (select distinct dateday from t) d cross join
(select distinct id from t) i left join
t
on t.dateday = d.dateday and t.id = i.id
order by d.dateday, i.id;
If you have other tables with the days or ids, then use those instead of the subqueries.
Once you have all the days, you can include a window frame clause:
sum(t.value) over (partition by i.id order by d.dateday rows between 6 preceding and current day) as running_sum