Search code examples
classificationlogarithmhidden-markov-models

Jahmm lib: how to interpret negative value from ForwardBackwardScaledCalculator.lnProbability()?


I use the Jahmm library for classification of accelerometer sequences.

I have created my models but when i try to calculate the proibablity of a test sequence on a model by:

ForwardBackwardScaledCalculator fbsc = new ForwardBackwardScaledCalculator(test_pair.getValue(),model_pair.getValue().get_hmm());
                System.out.println(fbsc.lnProbability());

I get negative values like -1278.0926336276573.

The comment in the code of the library states that the lnProbability method:

Return the napierian logarithm of the probability of the sequence that generated this object.

Returns: The probability of the sequence of interest's napierian logarithm

But how to compare two of such logarithms? I call the method on two different models with the two test sequences so i get 4 probabilities:

The test sequence: fast_test.seq on fast_model yields a Napierian log from -1278.0926336276573
The test sequence: fast_test.seq on slow_model yields a Napierian log from -1862.6947488370433
The test sequence: slow_test.seq on fast_model yields a Napierian log from -4433.949818774553
The test sequence: slow_test.seq on slow_model yields a Napierian log from -4208.071445499895

But in this context, does it mean that the closer we get to zero, the more similar the test sequence is to the model (so in this example the classification accuracy = 100%?)

Thank you


Solution

  • If by "Napierian logarithm", the natural logarithm is meant, then you can get a probability from a return value x by raising e to the x, e.g. using Math.exp. However, the reason that logarithms are returned is because the probability values are too small to represent in a double; Math.exp(-1278.0926336276573) will simply return zero. See the Wikipedia article about log probabilities.

    does it mean that the closer we get to zero, the more similar the test sequence is to the model

    exp(0) == 1 and log(1) == 0, and indeed the lower the probability, the smaller (more negative) its logarithm. So, the closer you get to zero, the more probable the sentence is under the model.

    However, this need not directly relate to "similarity to a model", let alone "classification accuracy", since HMMs (being generative models) will ascribe lower probability to longer sequences. Read up on HMMs in your favorite textbook; a full explanation would be too long for this answer box and is a math question, so off-topic for this website.