Working with small probabilities, via logs

Source: Google Code Jam. https://code.google.com/codejam/contest/10224486/dashboard#s=a&a=1

We're asked to calculate Prob(K successes from N trials) where each of the N trials has a known success probability of p_n.

Some Analysis and thoughts on the problem are given after the Code Jam.

They observe that evaluating all possible outcomes of your N trials would take you an exponential time in N, so instead they provide a nice "dynamic programming" style solution that's O(N^2).

Let P(p#q) = Prob(p Successes after the first q Trials) Then observe the fact that:

Prob(p#q) = Prob(qth trial succeeds)*P(p-1#q-1) + Prob(qth trial fails)*P(p#q-1)

Now we can build up a table of P(i#j) where i<=j, for i = 1...N

That's all lovely - I follow all of this and could implement it.

Then as the last comment, they say:

In practice, in problems like this, one should store the logarithms of
probabilities instead of the actual values, which can become small
enough for floating-point precision errors to matter.

I think I broadly understand the point they're trying to make, but I specifically can't figure out how to use this suggestion.

Taking the above equation, and substuting in some lettered variables:

P = A*B + C*D

If we want to work in Log Space, then we have:

Log(P) = Log(A*B + C*D),

where we have recursively pre-computed Log(B) and Log(D), and A & B are known, easily-handled decimals.

But I don't see any way to calculate Log(P) without just doing e^(Log(B)), etc. which feels like it would defeat to point of working in log space`?

Does anyone understand in better detail what I'm supposed to be doing?

Solution

Starting from the initial relation:

P = A⋅B + C⋅D

Due to its symmetry we can assume that B is larger than D, without loss of generality. The following processing is useful:

log(P) = log(A⋅B + C⋅D) = log(A⋅e^log(B) + C⋅e^log(D)) = log(e^log(B)⋅(A + C⋅e^{log(D) - log(B)})

log(P) = log(B) + log(A + C⋅e^{log(D) - log(B)}).

This is useful because, in this case, log(B) and log(D) are both negative numbers (logarithms of some probabilities). It was assumed that B is larger than D, thus its log is closer to zero. Therefore log(D) - log(B) is still negative, but not as negative as log(D).

So now, instead of needing to perform exponentiation of log(B) and log(D) separately, we only need to perform exponentiation of log(D) - log(B), which is a mildly negative number. So the above processing leads to better numerical behavior than using logarithms and applying exponentiation in the trivial way, or, equivalently, than not using logarithms at all.