Search code examples
pythonpandasstatsmodels

Shapes not alligned for exaclty same shapes in statsmodels


I'm using pandas dataframes and series as test and train data. I'm checking shapes of my train dataframe and test dataframe and they are absolutely identical. But I still have shapes not aligned error. Here is my fit/predict code:

train_df = df.loc[:50]
X_train = train_df[["Value", "Momentum", "Quality", "MinimumVolatility"]]
y_train = train_df["P1ExRe"]

X_train = sm.add_constant(X_train)

model = sm.OLS(y_train, X_train)
results = model.fit()
test_df = df.loc[51:100]
x_test = test_df[["Value", "Momentum", "Quality", "MinimumVolatility"]]
y_test = test_df["P1ExRe"]

print(x_test.shape==X_train.shape)
model.predict(x_test)

Here is the error:

    ValueError                                Traceback (most recent call last)
<ipython-input-108-832ad1f6bc61> in <module>
      4 
      5 print(x_test.shape==X_train.shape)
----> 6 model.predict(x_test)

~/projects/courserads/venv/lib/python3.6/site-packages/statsmodels/regression/linear_model.py in predict(self, params, exog)
    378             exog = self.exog
    379 
--> 380         return np.dot(exog, params)
    381 
    382     def get_distribution(self, params, scale, exog=None, dist_class=None):

<__array_function__ internals> in dot(*args, **kwargs)

ValueError: shapes (50,5) and (50,5) not aligned: 5 (dim 1) != 50 (dim 0)

Solution

  • You are using the model.predict method. You should use results.predict(...).

    Model predict requires params, because only the results has the estimated parameters.

    Your x_test in model.predict is interpreted as params and causes the shape mismatch.