Search code examples
pythonpickleprediction

Using trained GB classifier for new data


I have trained my Gradient Boosting Classifier and saved the model using pickle

with open("model.bin", 'wb') as f_out:
    pickle.dump(xgb_clf, f_out)

As a data source, I had .csv-file.

Now I need to test the performance on completely new data, but I do not now how.

I found several tutorials, but was unable to proceed.

I understand that the key is to load the saved model

with open('model.bin', 'rb') as f_in:
    model = pickle.load(f_in)

but I do not know how to apply this model on new data I have in csv.

Could you help, please?

Thank you.


Solution

  • The model object you are using should have a method, similar to model.predict(x), depending on the library (I'm assuming it is scikit-learn).

    You need to load the data from the .csv file:

    import pandas as pd
    data = pd.read_csv('data.csv')  
     
    

    Select columns that belong to x:

    x = data[['col1', 'col2']]
    

    And call the prediction:

    res = model.predict(x)