I have a hive table that looks like this
id, index, p
1, 1, 0.2
1, 2, 0.1
1, 3, 0.3
1, 4, 0.1
2, 1, 0.2
2, 2, 0.3
2, 3, 0.4
For each id
, I want to calculate the following:
\sum_{i=1}^n(\prod _{k=1}^{i-1}(1-p_k))
For instance, for id=2
, this would be p_1 + (1-p_1)*p2 + (1-p_1)*(1-p_2)*p3
since it only has three probabilities. Is there a way to do this in Hive using windowing functions?
You may try using product with windowing, partitioning by id and ordering by index. I'm not sure if there is an aggregate function for product, but if not, one can make use of log+sum+exp to get the same effect.
with table1 as (
select *,
sum(log(1-p)) over(partition by id order by index rows between unbounded preceding and 1 preceding) as log_sum
from source_table
)
select id, sum(p*coalesce(exp(log_sum), 1)) as p_prod_q_prior
from table1
group by id