I have been trying to find a way to fit some of my columns (that contains user click data) to poisson distribution in python. These columns (e.g., click_website_1, click_website_2) may contain a value ranging from 1 to thousands. I am trying to do this as it is recommended by some resources:
We recommend that count data should not be analysed by log-transforming it, but instead models based on Poisson and negative binomial distributions should be used.
I found some methods in scipy
and numpy
, but these methods seem to generate some random numbers that have poisson distribution. However, what I am interested in is to fit my own data to poisson distribution. Any library suggestions to do this in Python?
Here is a quick way to check if your data follows a poisson distribution. You plot the under the assumption that it follows a poisson distribution with rate parameter lambda = data.mean()
import numpy as np
from scipy.misc import factorial
def poisson(k, lamb):
"""poisson pdf, parameter lamb is the fit parameter"""
return (lamb**k/factorial(k)) * np.exp(-lamb)
# lets collect clicks since we are going to need it later
clicks = df["clicks_website_1"]
Here we use the pmf for possion distribution.
Now lets do some modeling, from data (click_website_one) we'll estimate the the poisson parameter using the MLE, which turns out to be just the mean
lamb = clicks.mean()
# plot the pmf using lamb as as an estimate for `lambda`.
# let sort the counts in the columns first.
clicks.sort().apply(poisson, lamb).plot()