Search code examples
pythondataframescipyscikit-learnpoisson

How to fit a column of a dataframe into poisson distribution in Python


I have been trying to find a way to fit some of my columns (that contains user click data) to poisson distribution in python. These columns (e.g., click_website_1, click_website_2) may contain a value ranging from 1 to thousands. I am trying to do this as it is recommended by some resources:

We recommend that count data should not be analysed by log-transforming it, but instead models based on Poisson and negative binomial distributions should be used.

I found some methods in scipy and numpy, but these methods seem to generate some random numbers that have poisson distribution. However, what I am interested in is to fit my own data to poisson distribution. Any library suggestions to do this in Python?


Solution

  • Here is a quick way to check if your data follows a poisson distribution. You plot the under the assumption that it follows a poisson distribution with rate parameter lambda = data.mean()

    import numpy as np
    from scipy.misc import factorial
    
    
    def poisson(k, lamb):
        """poisson pdf, parameter lamb is the fit parameter"""
        return (lamb**k/factorial(k)) * np.exp(-lamb)
    
    # lets collect clicks since we are going to need it later
    clicks = df["clicks_website_1"] 
    

    Here we use the pmf for possion distribution.

    Now lets do some modeling, from data (click_website_one) we'll estimate the the poisson parameter using the MLE, which turns out to be just the mean

    lamb = clicks.mean()
    
    # plot the pmf using lamb as as an estimate for `lambda`. 
    # let sort the counts in the columns first.
    
    clicks.sort().apply(poisson, lamb).plot()