Search code examples
pythonpandas

How can i solve this error: ValueError: The feature names should match those that were passed


I am trying to traing and predict with a my model, so the current error is

ValueError: The feature names should match those that were passed during fit. Feature names unseen at fit time:

this is my code

X_train = X_train.drop(columns=['InvoiceDate', "BillingAddress", "BillingCity", "BillingState", "BillingCountry", "BillingPostalCode", "Rowversion_x", "Rowversion_y", "Rowversion", "Name", "Composer"], axis=1)


print(f'Tipo X_train: {type(X_train)} Tipo y_train: {type(y_train)}')
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

mining_data.to_csv('mining_table.csv', index=False)

i dont understand what is the error, i am excluding some columns because i was getting another error like this:

ValueError: could not convert string to float: 'Calle Lira, 198'


Solution

  • Because you have dropped columns in your X_train but not in your X_test, the model is telling you it is seeing columns during testing that it hasn't encountered at training.
    The number of dimensions/columns in your inputs, for both train and test should be the same. If the number of columns is already the same and you are encountering the issue, check that the columns you are sending are the right ones, and if thats so (maybe just a typo or your columns have different titles but the data is matching) pass the numpy version of your data by doing X_train.values