Search code examples
sqlhiveconditional-statementsprobabilitywindowing

Multiplying numbers using windowing functions for conditional Probability in Hive SQL


I have a hive table that looks like this

id, index, p
1, 1, 0.2 
1, 2, 0.1 
1, 3, 0.3 
1, 4, 0.1 
2, 1, 0.2 
2, 2, 0.3 
2, 3, 0.4

For each id, I want to calculate the following:

\sum_{i=1}^n(\prod _{k=1}^{i-1}(1-p_k))

enter image description here

For instance, for id=2, this would be p_1 + (1-p_1)*p2 + (1-p_1)*(1-p_2)*p3 since it only has three probabilities. Is there a way to do this in Hive using windowing functions?


Solution

  • You may try using product with windowing, partitioning by id and ordering by index. I'm not sure if there is an aggregate function for product, but if not, one can make use of log+sum+exp to get the same effect.

    with table1 as (
        select *, 
        sum(log(1-p)) over(partition by id order by index rows between unbounded preceding and 1 preceding) as log_sum
        from source_table
    )
    select id, sum(p*coalesce(exp(log_sum), 1)) as p_prod_q_prior
    from table1
    group by id