python pandas matplotlib regression statsmodels

How to save each iterating Statsmodel as a file to be used later?

I have the following table generated:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

# Generate 'random' data
np.random.seed(0)
X = 2.5 * np.random.randn(10) + 1.5
res = 0.5 * np.random.randn(10)
y = 2 + 0.3 * X + res
Name = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']

# Create pandas dataframe to store our X and y values
df = pd.DataFrame(
    {'Name': Name,
     'X': X,
     'y': y})

# Show the dataframe
df

Resulting in the following table:

Name	X	y
A	5.910131	3.845061
B	2.500393	3.477255
C	3.946845	3.564572
D	7.102233	4.191507
E	6.168895	4.072600
F	-0.943195	1.883879
G	3.875221	3.909606
H	1.121607	2.233903
I	1.241953	2.529120
J	2.526496	2.330901

The the following code iterates to exludes one row at a time, and builds a set of regression plots:

import statsmodels.formula.api as smf
import warnings
warnings.filterwarnings('ignore')
# Initialise and fit linear regression model using `statsmodels`

for row_index, row in df.iterrows():
    # dataframe with all rows except for one
    df_reduced = df[~(df.index == row_index)]
    model = smf.ols('X ~ y', data=df_reduced)
    model = model.fit()
    intercept, slope = model.params
    print(model.summary())

    y1 = intercept + slope * df_reduced.y.min()
    y2 = intercept + slope * df_reduced.y.max()
    plt.plot([df_reduced.y.min(), df_reduced.y.max()], [y1, y2], label=row.Name, color='red')
    plt.scatter(df_reduced.y, df_reduced.X)
    plt.legend()
    plt.savefig(f"All except {row.Name} analogue.pdf")
    plt.show()

The question is, how can I save each of the models that are being generated as a file that can be used later ? In this present example, there should be at least 9 regression models being generated. I would like to have them each as a file that can be identified with a name as well.

Second question is, how can I add a space in between each of the model summary and plots in the visual generations of matplotlib.

Solution

You just need to add this: model.save(f"model_{row_index}.pkl") in you loop