Search code examples
pythonmachine-learningstatsmodelsquantile-regression

Quantile regression prediction


I have used statsmodels.formula.api.quantreg to predict on the test set. While running this method I got an unexpected error:

AttributeError                            Traceback (most recent call last)
<ipython-input-34-12e0d345b0fc> in <module>
----> 1 test['ypredL'] = model1.predict( test ).values
      2 test['FVC']    = model2.predict( test ).values
      3 test['ypredH'] = model3.predict( test ).values
      4 test['Confidence'] = np.abs(test['ypredH'] - test['ypredL']) / 2

~\anaconda3\envs\knk\lib\site-packages\statsmodels\base\model.py in predict(self, exog, transform, *args, **kwargs)
   1081                        '\n\nThe original error message returned by patsy is:\n'
   1082                        '{0}'.format(str(str(exc))))
-> 1083                 raise exc.__class__(msg)
   1084             if orig_exog_len > len(exog) and not is_dict:
   1085                 import warnings

AttributeError: predict requires that you use a DataFrame when predicting from a model
that was created using the formula api.

The original error message returned by patsy is:
'DataFrame' object has no attribute 'dtype'

The intriguing part is that the same predict was run on the training set and it worked perfectly fine! Here is the code for the training part:

model1 = quantreg('FVC ~ Weeks+Percent+Age+Sex+SmokingStatus',
                           train).fit(q = 0.25)
model2 = quantreg('FVC ~ Weeks+Percent+Age+Sex+SmokingStatus',
                           train).fit(q = 0.5)
model3 = quantreg('FVC ~ Weeks+Percent+Age+Sex+SmokingStatus',
                           train).fit(q = 0.75)

train['y_predL'] = model1.predict(train).values
train['y_pred'] = model2.predict(train).values
train['y_predH'] = model3.predict(train).values

Output: enter image description here


Solution

  • The error 'DataFrame' object has no attribute 'dtype' is right, but it is difficult to understand. So, what it really means is that there must be a conflict in dtypes in between the training and the test set. In the question, there was a dtype mismatch between the Weeks in the training set and the test set.

    dtype of Train-Weeks is int, and dtype of Test-Weeks is str.