Search code examples
algorithmmathinformation-retrievallogarithm

Clamping Negative Logarithm of the probability to a positive value in a information retrieval environment(Language Modelling)


If we take logarithms of probability the value returned is negative.Value is used in a matcher of information retrieval library which rejects the negative value hence i need to clamp the negative value to a positive value,so that matcher doesn't reject the document.

One approach could be add a random number say K to the probability

i.e return max(log( prob. + K) where K is a large constant or return max(log(K.Prob),0) where K is a large constant

Is there any better approach to clamp the negative log value to positive? which of these would be a better approach to follow?

In case we select any of the above approach, i feel very dizzy about how to select an appropriate K. I would be glad if someone can suggest how to select an appropriate large K ?

P.S it is important to use logarithm values as we are trying to implement model where we need to multiply probability but due to in-feasibility of architecture to support that we are summing the log of probability which is product of probability,hence using log value is important (taking antilog is not a workable approach) here


Solution

  • You can always use log(1 + p). This will offset your range from (-inf, 0] -> [0, log(2)]. This I think will solve your problem.

    The most used way in general is to take negative of log as suggested by others. You can alternatively use 1/(1-log(p)) as well but this will not be helpful in your case.

    So log(1 + p) seems to be the best solution.