Search code examples
pythonpandasscikit-learndecision-treesklearn-pandas

ValueError: could not convert string to float: 'what' (Sklearn), How to use the labelencoder?


I have two training sets input and output set

X = df['First Word']

y = df['Answers']

When I tried:

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()
model.fit(X,y)
predictions = model.predict(['how'])

I got the error:

ValueError: could not convert string to float: 'what'

The error refers to that str() cannot be passed to the fit() method.

How to use the LabelEncoder in this case so that the above code works?


Solution

  • All ML models need input in the form of numbers so you need to encode the input data either label encoder or one-hot encoding as per your need.

    you can encode your dataframe using the below code

     from sklearn import preprocessing
     le = preprocessing.LabelEncoder()
     X = le.fit_transform(X)
    

    After encoding pass to model, I hope you won't get that error