Search code examples
pythonarraysnumpytypeerrorxgboost

type str doesn't define __round__ method error


Trying to implement XGBoost to determine the most important variables, I have some error with the arrays.

My complete code is the following

from numpy import loadtxt
from numpy import sort
import pandas as pd
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.feature_selection import SelectFromModel


df = pd.read_csv('data.txt')
array=df.values
X= array[:,0:330]
Y = array[:,330]

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=7)


model = XGBClassifier()
model.fit(X_train, y_train)


y_pred = model.predict(X_test)
predictions = [round(value) for value in y_pred]

and I get the following error:

TypeError: type str doesn't define __round__ method

What can I do?


Solution

  • More than likely some of the labels you have in y_train are actually strings instead of numbers. sklearn and xgboost don't require the labels to be numeric.

    Try checking the types of y_pred.

    from collections import Counter
    
    Counter([type(value) for value in y_pred])
    

    Here is an example of what I mean with numeric labels

    import numpy as np
    from sklearn.ensemble import GradientBoostingClassifier
    
    # test with numeric labels
    x = np.vstack([np.arange(100), np.sort(np.random.normal(10, size=100))]).T
    y = np.hstack([np.zeros(50, dtype=int), np.ones(50, dtype=int)])
    model = GradientBoostingClassifier()
    model.fit(x,y)
    model.predict([[10,7]])
    # returns an array with a numeric 
    array([0])
    

    and here with string labels (same x data)

    y = ['a']*50 + ['b']*50
    model.fit(x,y)
    model.predict([[10,7]])
    # returns an array with a string label
    array(['a'], dtype='<U1')
    

    Both are value labels. However, when you attempt to use round on a string variable, you get exactly the error you are seeing.

    round('a')
    
    TypeError: type str doesn't define __round__ method