Search code examples
pandasrandom-forest

Problem with changing column category - could not convert string to float


I wanted to change the column type to category with the following code:

df["Geography"] = df["Geography"].astype("category")

Then, use random forest algorithm as following:

X = df.drop('target', axis = 1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size = 0.15, random_state = 123,stratify=y )

forest = RandomForestClassifier(n_estimators = 500, random_state = 1)

And when fitting the algorithm:

forest = RandomForestClassifier(n_estimators = 500, random_state = 1)

The following error occurs:

could not convert string to float: 'Spain'

Spain is a row in a geography column which I converted to categorical value. Why do I get an error?


Solution

  • your feature type has changed to "category", but categories could be names of countries, so if you need categories as numbers you could use the categorical index:

    df["Geography"] = pd.CategoricalIndex(df["Geography"])