Search code examples
scikit-learnlogistic-regression

Logistic Regression using sklearn


I have been working with titanic data set. Where I am getting TypeError when trying to fit data.

**Step 4: Train & Test Data**: Build the model on the train and predict the output on the test data.

**Train Data**

X=titanic_data.drop("Survived", axis= 1)
y= titanic_data['Survived']

from sklearn.model_selection import train_test_split
#It must relate to the renaming and deprecation of cross_validation sub-module to model_selection. Try substituting cross_validation to model_selection

X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.33, random_state=42)

from sklearn.linear_model import LogisticRegression

logmodel=LogisticRegression()

logmodel.fit(X_train, y_train)
TypeError: Feature names are only supported if all input features have string names, but your input has ['int', 'str'] as feature name / column name types. If you want feature names to be stored and validated, you must convert them all to strings, by using X.columns = X.columns.astype(str) for example. Otherwise you can remove feature / column names from your input data, or convert them all to a non-string data type.```

Solution

  • The error indicates an issue with the column names. Some names are strings, and other names are numerical. Adding X.columns = X.columns.astype(str) will convert all the column names to str type (so the integer 0 will become a string "0", etc).