I have trained my Gradient Boosting Classifier and saved the model using pickle
with open("model.bin", 'wb') as f_out:
pickle.dump(xgb_clf, f_out)
As a data source, I had .csv-file.
Now I need to test the performance on completely new data, but I do not now how.
I found several tutorials, but was unable to proceed.
I understand that the key is to load the saved model
with open('model.bin', 'rb') as f_in:
model = pickle.load(f_in)
but I do not know how to apply this model on new data I have in csv.
Could you help, please?
Thank you.
The model
object you are using should have a method, similar to model.predict(x)
, depending on the library (I'm assuming it is scikit-learn).
You need to load the data from the .csv file:
import pandas as pd
data = pd.read_csv('data.csv')
Select columns that belong to x
:
x = data[['col1', 'col2']]
And call the prediction:
res = model.predict(x)