pythonscikit-learnscipystatisticspoisson# How to implement Poisson regression with scikit-learn for count data prediction

I'm looking for a way to couple Linear Regression with Poisson distribution. After a simple Linear Regression, its result is a numerical value that i would like to use in a Poisson Distribution, because i need the *probability* and i need the *probability discrete for the number of events that occur successively and independently in a given time interval*.

**MY CASE IS:** Having an independent variable `Team_A__goal_scored = [1, 2, 2, 3, 1]`

and a dependent variable `Team_B__goal_conceded = [2, 1, 3, 0, 1]`

, i would like to calculate the probability that Team A can score `2 goals`

exact to Team B (so relate Team A's attack and Team B's defense). Running the regression and setting 2 goals as output with `test = myfunc(2)`

, i get `1.28`

:

```
import numpy
from scipy import stats
Team_B__goal_conceded = [1, 2, 2, 3, 1] #Var indipendent (x)
Team_A__goal_scored = [2, 1, 3, 0, 1] ##Var dipendent (y)
slope, intercept, r, p, std_err = stats.linregress(Team_B__goal_conceded, Team_A__goal_scored)
def myfunc(Team_B__goal_conceded):
return slope * Team_B__goal_conceded + intercept
regression_scored_2 = myfunc(2) #result: 1.28
```

**PROBLEM:** The specific problem is that i need a probability, instead by performing a simple Linear Regression i obtain for example `1.28`

which is not a probability.

To recap, I know the initial starting point with `Team_B__goal_conceded`

and `Team_A__goal_scored`

, i know what i want to achieve (probability of Team A scoring exactly `2`

goals to Team B), but i don't know how i can achieve it. Having reflected that Linear Regression is not for me because i don't get a percentage, I thought that "maybe" a solution could be the Poisson Regression (also called **Log-Linear Regression**) could be useful to me, but I'm having trouble using it. I'm not sure if it is the solution to what I'm looking for, so I'm looking for someone who knows how to use it and tell me if it is useful for what I would like to achieve and who tells me how to use it. If this is not the case, then surely there are other ways to use the result of Linear Regression in a Poisson distribution.

**EXPECTATIONS:** The final output i want to get are the individual probabilities of a soccer team scoring 0 goals, 1 goal, 2 goals, 3 goals, etc, after running the regression. To be precise, i only need exactly `2`

goals, so I would like this final output:

```
The probability that Team A scores exactly 2 goals against Team B is: x %
```

**MY CODE:**
I tried to use sklearn.linear_model.PoissonRegressor, but it doesn't work properly. As I said above, I don't even know if this is the most suitable solution for my case or if there are better and more appropriate methods. If this is not the case, then surely there are other ways to use the result of Linear Regression in a Poisson distribution.

```
from sklearn import linear_model
clf = linear_model.PoissonRegressor()
Team_A__goal_scored = [1, 2, 2, 3, 1] #Var indipendent (x)
Team_B__goal_conceded = [2, 1, 3, 0, 1] #Var dipendent (y)
a=clf.fit(Team_A__goal_scored, Team_B__goal_conceded)
print("Fit a Generalized Linear Model: ", a, "\n")
b = clf.score(Team_A__goal_scored, Team_B__goal_conceded)
print("Compute D^2, the percentage of deviance explained: ", b, "\n")
c = clf.coef_
print("Estimated coefficients for the linear predictor: ", c, "\n")
d= clf.intercept_
print("Intercept (a.k.a. bias) added to linear predictor: ", d, "\n")
e=clf.predict([[1, 1], [3, 4]])
print("Predict using GLM with feature matrix X: ", e, "\n")
```

I get this error:

```
ValueError: Expected 2D array, got 1D array instead:
array=[1. 2. 2. 3. 1.].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
```

because should the lists be like (as the scikit-learn link shows) for example: `X = [[1, 2], [2, 3], [3, 4], [4, 3]] and y = [12, 17, 22, 21]`

, but i didn't understand how i could use mine in this way.

I also don't know how to set that the result must be exactly `2`

goals (probability of Team A scoring exactly 2 goals to Team B)

Thank you

Solution

Your approach with `PoissonRegressor`

is suitable for your problem, but it expects a 2D array for the features (`X`

). You can reshape your 1D array using `numpy.reshape`

:

```
import numpy as np
from sklearn.linear_model import PoissonRegressor
Team_A__goal_scored = np.array([1, 2, 2, 3, 1]).reshape(-1, 1)
Team_B__goal_conceded = np.array([2, 1, 3, 0, 1])
```

Then you can fit the model:

```
clf = PoissonRegressor()
clf.fit(Team_A__goal_scored, Team_B__goal_conceded)
```

Then you can make the prediction (e.g. `Team_A__goal_scored = 2`

):

```
lambda_pred = clf.predict(np.array([[2]]))[0]
```

Then you can use the Poisson probability mass function to find the probability of scoring exactly 2 goals:

```
from scipy.stats import poisson
probability_two_goals = poisson.pmf(2, lambda_pred)
```

This gives you the probability that Team A scores exactly 2 goals against Team B.

- Python Jinja2 LaTeX Table
- Getting attributes of a class
- How can I print many significant figures in Python?
- How to allow list append() method to return the new list
- Calculate Last Friday of Month in Pandas
- Python type hint for Iterable[str] that isn't str
- How to iterate over a list in chunks
- How to exit the entire application from a Python thread?
- Running shell command and capturing the output
- How do I pass a variable by reference?
- Convert range(r) to list of strings of length 2 in python
- How can I get the start and end dates for each week?
- how to use send_message() in python-telegram-bot
- Python conditional replacement based on element type
- How can I count the number of items in an arbitrary iterable (such as a generator)?
- Find longest consecutive range of numbers in list
- Insert text in braces with asyncpg
- How does one put a link / url to the web-site's home page in Django?
- How to determine if a path is a subdirectory of another?
- Custom Keybindings for Ipython terminal
- FastAPI asynchronous background tasks blocks other requests?
- How to make sure that information from one file is duplicated into several text documents, without specific lines
- Installing a Python environment with Anaconda
- sklearn pipeline model predicting same results for all input
- Brew command not found after installing Anaconda Python
- How to get an XPath from selenium webelement or from lxml?
- Pipe PuTTY console to Python script
- How to align the axes of a figure in matplotlib?
- Persist ParentDocumentRetriever of langchain
- How to reset index in a pandas dataframe?