Search code examples
pythonregressionmathematical-optimizationpoisson

Simultaneously do multiple Poisson regressions in Python


I have data of the form

n = number of samples

features: n x 1 matrix
data: n x m matrix

I want to perform multiple Poisson regressions with the same features, where the output values vary across the columns of data. Currently, I do one Poisson regression at a time using sklearn, eg my Python code looks something like

from sklearn import linear_model

clf = linear_model.PoissonRegressor(fit_intercept=True,alpha=0)
for col in range(m):
    clf.fit(features,data[:,col])

However I have to do many of these Poisson regressions and it is way too slow to do them all individually. So my question is: is there a way (in Python) that I can simultaneously do all m of these Poisson regressions at once?

If I were doing linear regression instead, then I could use nice matrix tricks to do these simultaneously. However the key difference here is that the Poisson regression involves using an optimization algorithm to maximize a likelihood function. So essentially, I would like to solve multiple optimization problems at once.

One thing I tried was to use scipy.optimize to maximize the sum of the (log) likelihoods for each Poisson regression. However this was incredibly sensitive to the initialization and did not converge.

Thus I am hoping there is either:

  1. a better Python optimization package I can use to maximize the Poisson regression objective function (sum of likelihoods), or
  2. a Python package that allows you to do simultaneous Poisson regressions.

Does anyone have any ideas? Any help would be greatly appreciated. Thank you!


Solution

  • It looks like you are looking for multiprocessing.
    It will enable you to run N Poisson regression calculation at the same time.
    A simple example below:

    from multiprocessing import Pool
    
    def f(x):
        return x*x
    
    if __name__ == '__main__':
        with Pool(5) as p:
            print(p.map(f, [1, 2, 3]))