I want to use multiprocessing for linear regression modelling when calculating different confidence intervals for the model.
In this example I'm using the dataset from https://www.geeksforgeeks.org/linear-regression-in-python-using-statsmodels/.
I've fitted the model. I've viewed the confidence interval print model.summary()
, and the default confidence interval is 95%. I know that you can set the confidence interval to eg 99% using the alpha
argument in model.summary(alpha=0.01)
.
I'd expect the output of my code to be a list of summaries with different confidence intervals. The problem with the code below is that every summary in the list has the same default 95% confidence interval. So clearly passing the different confidence intervals isn't working. But how do I make it work?
Thanks!
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from multiprocessing import Pool
# Data
df = pd.read_csv('C:\\Users\\Me\\Desktop\\headbrain1.csv')
# Model
df.columns = ['Head_size', 'Brain_weight']
model = smf.ols(formula='Head_size ~ Brain_weight', data=df).fit()
# Summaries
if __name__ == "__main__":
pool = Pool()
summaries_list = pool.map(model.summary, [0.05, 0.04 0.01])
print(summaries_list)
The issue is that the model.summary()
method in statsmodels
doesn't directly accept the alpha parameter when used with Pool.map
. To get around this, you can use a lambda function or Python's functools.partial
to create a wrapper that accepts the alpha value.
Here's one way to modify your code using a lambda function:
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from multiprocessing import Pool
df = pd.read_csv('C:\\Users\\Me\\Desktop\\headbrain1.csv')
df.columns = ['Head_size', 'Brain_weight']
model = smf.ols(formula='Head_size ~ Brain_weight', data=df).fit()
# Function to get summary with different alpha values
def get_summary(alpha):
return model.summary(alpha=alpha)
# Summaries
if __name__ == "__main__":
pool = Pool()
alphas = [0.05, 0.04, 0.01]
summaries_list = pool.map(get_summary, alphas)
pool.close()
pool.join()
for summary in summaries_list:
print(summary)