Search code examples
pythontime-seriesxgbooststock

XGBoost feature_names mismatch time series


I am trying to predict the stock trend where 1 is the stock increases and 0 is the stock decreases on that particular day. My input features are close price, volume, current day trend and my output is the trend the next day. When applying XGBClassifier() I encounter an error:

ValueError                                Traceback (most recent call last)
<ipython-input-101-d14cdb520e55> in <module>
      1 val = np.array(test[0, 0]).reshape(1, -1)
      2 
----> 3 pred = model.predict(val)
      4 print(pred[0])

~/opt/anaconda3/lib/python3.8/site-packages/xgboost/sklearn.py in predict(self, data, output_margin, ntree_limit, validate_features, base_margin)
    968         if ntree_limit is None:
    969             ntree_limit = getattr(self, "best_ntree_limit", 0)
--> 970         class_probs = self.get_booster().predict(
    971             test_dmatrix,
    972             output_margin=output_margin,

~/opt/anaconda3/lib/python3.8/site-packages/xgboost/core.py in predict(self, data, output_margin, ntree_limit, pred_leaf, pred_contribs, approx_contribs, pred_interactions, validate_features, training)
   1483 
   1484         if validate_features:
-> 1485             self._validate_features(data)
   1486 
   1487         length = c_bst_ulong()

~/opt/anaconda3/lib/python3.8/site-packages/xgboost/core.py in _validate_features(self, data)
   2058                             ', '.join(str(s) for s in my_missing))
   2059 
-> 2060                 raise ValueError(msg.format(self.feature_names,
   2061                                             data.feature_names))
   2062 

ValueError: feature_names mismatch: ['f0', 'f1', 'f2', 'f3'] ['f0']
expected f2, f1, f3 in input data

My code is as follows:

def xgb_predict(train, val):
    train = np.array(train)
    x, y = train[:, :-1], train[:, -1] 
    model = XGBClassifier()
    model.fit(x, y)
    
    val = np.array(val).reshape(1, -1)
    pred = model.predict(val)
    return pred[0]

xgb_predict(train, test[0, 0])

I receive an error on the 8th line. Thank you very much for the help:)

Edit: Included a sample of data


Solution

  • The column selection of your test data must be done as you do for the train data. This means that your last line should be:

    xgb_predict(train, test[0, :-1])
    

    so you can select all columns/features but the last one which is the target value.