Search code examples
javaalgorithmmathstatisticsmultiplatform

Maximum Likelihood Estimation of a Poisson Distribution?


I have a table with observations (x, y) and need to estimate the mean of the Poisson distribution that more closely resembles them. It seems R and Octave can both do this on Linux, but I was wondering if there is a multiplatform way to do it. I can bundle anything with the program but I can't ask to install anything for it to run.

I tried searching for an algorithm to do it myself and couldn't find one, so I don't know what to do.

For the record, I did find a simple algorithm to do it that was basically summing all the values and dividing by the number of examples, but it fails for even a trivial example taken directly from a book.

Example:

requisitions per day : absolute frequency (days) : relative frequency
 8 :  2 : 0.016
 9 :  4 : 0.033
10 :  6 : 0.050
11 :  8 : 0.066
12 : 10 : 0.083
13 : 12 : 0.100
14 : 13 : 0.108
15 : 14 : 0.116
16 : 12 : 0.100
17 : 10 : 0.083
18 :  9 : 0.075
19 :  7 : 0.058
20 :  5 : 0.041
21 :  3 : 0.025
22 :  2 : 0.016
23 :  2 : 0.016
24 :  1 : 0.008

The mean for the Poisson distribution should be 15 (according to the book where I got the example). The method that I said above and is in one of the answers gives me 16. Using the sum of the squared euclidean distances I also find that the Poisson with mean 15 is closer to the data than the one with mean 16.


Solution

  • The MLE of the mean is just the sample mean. See Wikipedia:

    http://en.wikipedia.org/wiki/Poisson_distribution#Maximum_likelihood

    Just average your vector of data.

    Update: I'm extending this answer now, based on the sample data just added to the question.

    My interpretation of the sample data is that

    reqs-per-day   frequency
     8             2
     9             4
    10             6
    

    means that there were two days where the requisition count on each day was 8. And four days where the requisition count was 9. Therefore, I will assume that the data is equivalent to:

    8,8,9,9,9,9,10,10,10,10,10,10,...
    

    where each entry in this list corresponds to one day. The order of this list doesn't matter. I think you should average this list.

    The total of your frequency field is 120. I take this to mean there were 120 days altogether in the experiment.