Search code examples
pythontensorflowkerasscikit-learnsvm

ValueError: X has 1 features, but SVC is expecting 3 features as input


I am trying to create a stock price predictor (not to actually use it to invest don't worry) using Keras and sklearn, which grabs any of the time series from Kaggle and checks the "Close". It then takes a rolling time window of a particular length and predicts the directional accuracy, up (1) or down (0).

While trying to run the code below, the following error came up:

File "...", line 71, in test
    y_pred = self.model.predict(self.X_test)
ValueError: X has 1 features, but SVC is expecting 3 features as input.

Can someone guide me on what could possibly be the issue? What are the features that SVC is expecting that I may be missing?

CODE: Model.py

create_features checks if market is lower or higher according to rolling time window, and sets X and y:

#window_size = the set size of the rolling time window

def create_features(data, window_size):
    X = []
    y = []

    for i in range(0, len(data.index) - window_size):
        temp = [data.iloc[i + j]['Close'] for j in range(0, window_size)]
        avg = sum(temp) / len(temp)

        X.append(temp)
        y.append(0 if data.iloc[i + window_size]['Close'] < avg else 1)

    return X, y
class Model:
    def __init__(self, market: Market, training_percent: float, window_size: int):
        self.model = SVC(C=10, gamma='scale', kernel='rbf')

        X, y = create_features(market.data, window_size)
        self.X_train, self.y_train, self.X_test, self.y_test = train_test_split(X, y, shuffle=False, stratify=None, train_size=training_percent)

        self.X_train = np.array(self.X_train)
        self.y_train = np.array(self.y_test)

        #self.X_test = np.array(self.X_test).reshape(-1, 1)

    def train(self):
        self.model.fit(self.X_train, self.y_train)

    def test(self):
        y_pred = self.model.predict(self.X_test) #THE COMPLAINING LINE

        y_pred = [0 if i < 0.5 else 1 for i in y_pred]
        tn, fp, fn, tp = confusion_matrix(self.y_test, y_pred, labels=[0, 1]).ravel()
        print(tn, fp, fn, tp)
        print("Accuracy:", (tn + fp) / (tn + fp + fn + tp))

    def predict(self, input_array):
        return self.model.predict(input_array)

The above are called as:

model_test = Model(markets[m], training_testing[j], window_size[i])
model_test.train()
model_test.test()

Any help with this problem would be greatly appreciated. Thank you in advance.


Solution

  • The problem is how you are getting the output of train_test_split. As the documentation states, you should get the split datasets in order:

    # Notice the order of the unpacking.
    self.X_train, self.X_test, self.y_train, self.y_test = train_test_split(X, y, shuffle=False, stratify=None, train_size=training_percent)
    

    Hence, the testing dataset was of different shape because it was actually the training labels. You will not need the .reshape either.

    Also, not sure you want to do this:

    # Assigning y_test to y_train.
    self.y_train = np.array(self.y_test)