numpyscipystatisticslinear-regressionstatsmodels# Getting a different kurtosis from numpy array method than from Summary

I have to extract information from the OLS statsmodel summary. While doing it, the Kurtosis results from the summary is different than the array method `kurtosis()`

.

Here is the code:

```
from sklearn.datasets import load_diabetes
import pandas as pd
import statsmodels.api as sm
dic = load_diabetes()
df = pd.DataFrame(data=dic.data, columns=dic.feature_names)
y = dic.target
# %%
X = sm.add_constant(df)
model = sm.OLS(y, X)
res = model.fit()
print(res.summary2())
print(f'\n\nKurtosis by Array Method: {res.resid.kurtosis():.3f}')
```

Output:

```
"""
Results: Ordinary least squares
==================================================================
Model: OLS Adj. R-squared: 0.507
Dependent Variable: y AIC: 4793.9857
Date: 2023-10-20 16:26 BIC: 4838.9901
No. Observations: 442 Log-Likelihood: -2386.0
Df Model: 10 F-statistic: 46.27
Df Residuals: 431 Prob (F-statistic): 3.83e-62
R-squared: 0.518 Scale: 2932.7
-------------------------------------------------------------------
Coef. Std.Err. t P>|t| [0.025 0.975]
-------------------------------------------------------------------
const 152.1335 2.5759 59.0614 0.0000 147.0707 157.1963
age -10.0099 59.7492 -0.1675 0.8670 -127.4460 107.4263
sex -239.8156 61.2223 -3.9171 0.0001 -360.1471 -119.4841
bmi 519.8459 66.5334 7.8133 0.0000 389.0755 650.6163
bp 324.3846 65.4220 4.9583 0.0000 195.7988 452.9705
s1 -792.1756 416.6799 -1.9012 0.0579 -1611.1530 26.8017
s2 476.7390 339.0305 1.4062 0.1604 -189.6198 1143.0978
s3 101.0433 212.5315 0.4754 0.6347 -316.6838 518.7703
s4 177.0632 161.4758 1.0965 0.2735 -140.3147 494.4412
s5 751.2737 171.9000 4.3704 0.0000 413.4072 1089.1402
s6 67.6267 65.9843 1.0249 0.3060 -62.0643 197.3177
------------------------------------------------------------------
Omnibus: 1.506 Durbin-Watson: 2.029
Prob(Omnibus): 0.471 Jarque-Bera (JB): 1.404
Skew: 0.017 Prob(JB): 0.496
Kurtosis: 2.726 Condition No.: 227
==================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the
errors is correctly specified.
Kurtosis by Array Method: -0.264
Skewness by Array Method: 0.017
"""
```

I wanna know which one of the results are more reliable, and if I have to use the summary result, how to extract it. I'm also printing the skewness by array method in order to see if my approach is correct or if I'm doing something wrong.

I tried using the scipy stats function but the result is similar but not equal to the array method (-0.274).

Solution

This seems to be the difference between the Pearson kurtosis and Fisher (or excess) kurtosis. According to Wikipedia:

It is common practice to use excess kurtosis, which is defined as Pearson's kurtosis minus 3, to provide a simple comparison to the normal distribution.

When you subtract 3 from the kurtosis value in the summary, you obtain the same value as with `scipy.stats.kurtosis`

.
In fact, the function `scipy.stats.kurtosis`

has an option `fisher`

, which is `True`

by default, but can be set to `False`

to get the same result as in the summary:

```
from scipy.stats import kurtosis
kurtosis(res.resid) # gives -0.2740841793704205
kurtosis(res.resid, fisher=False) # gives +2.7259158206295795
```

So, my suggestion would be to use `scipy.stats.kurtosis`

, because it lets you choose explicitly which definition of kurtosis you want.

The pandas function `res.resid.kurtosis()`

computes Fisher kurtosis, but seems to use a different implementation and thus gives a slightly different value.
I would trust in Scipy.

- Installing numpy on Docker Alpine
- pandas equivalent of np.where
- Pandas modified rolling average
- dtype argument in numpy.genfromtxt
- Pandas accumulate time consecutively as long as condition is true
- How to best get a sample from a truncated normal distribution?
- Pandas: How to make apply on dataframe faster?
- Custom 2D Convolution not sharpening Image
- Problem of Numpy returning error while appending arrays
- How can I find out what's happening behind the scenes - python arrays calculation
- Pandas : compute mean or std (standard deviation) over entire dataframe
- What is dtype('O'), in pandas?
- Fill Numpy array with elements from DataFrame and with condition based on the last element of the array itself
- I wanted to create the array below, but for some reason it gives an error.The array itself was created using numpy
- Optimising array addition (y, x, RGBA)
- How to efficiently concatenate many arange calls in numpy?
- Issue with scipy quad integration in python
- implement an integration math equation using odeint in Python
- Efficiently finding consecutive streaks in a pandas DataFrame column?
- Normal Equation Implementation in Python / Numpy
- Custom transformer for sklearn Pipeline that alters both X and y
- How to write a function that works on both Numpy arrays and Pandas series, returning the same type
- Rewrite for-loop vectorized using Numpy
- Using numpy in AWS Lambda
- How to calculate rolling / moving average using python + NumPy / SciPy?
- AttributeError: 'Timedelta' object has no attribute 'dt'
- I need a highly accurate simultaneous equation solver for Python
- Pandas: change between mean/std and plus/minus notations
- Simple tensorflow/keras model with one hot vector output gives error message
- What does axis = 0 do in Numpy's sum function?