python pandas machine-learning scikit-learn logistic-regression

Getting an error while training a logistic regression model

I am trying to fit a logistic regression model to a dataset, and while training the data, I am getting the following error :

      1 from sklearn.linear_model import LogisticRegression
      2 classifier = LogisticRegression()
----> 3 classifier.fit(X_train, y_train)

ValueError: could not convert string to float: 'Cragorn'

The code snippet is as follows:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

data = pd.read_csv('predict_death_in_GOT.csv')
data.head(10)
X = data.iloc[:, 0:4]
y = data.iloc[:, 4]

plt.rcParams['figure.figsize'] = (10, 10)
alive = data.loc[y == 1]
not_alive = data.loc[y == 0]
plt.scatter(alive.iloc[:,0], alive.iloc[:,1], s = 10, label = "alive")
plt.scatter(not_alive.iloc[:,0], not_alive.iloc[:,1], s = 10, label = "not alive")
plt.legend()
plt.show()

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)
print(X_train, y_train)
print(X_test, y_test)

from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
**classifier.fit(X_train, y_train)**

The dataset looks like :

  Sr No  name   houseID  titleID    isAlive
0   0   Viserys II Targaryen    0   0   0
1   1   Tommen Baratheon        0   0   1
2   2   Viserys I Targaryen     0   0   0
3   3   Will (orphan)           0   0   1
4   4   Will (squire)           0   0   1
5   5   Willam                  0   0   1
6   6   Willow Witch-eye        0   0   0
7   7   Woth                    0   0   0
8   8   Wyl the Whittler        0   0   1
9   9   Wun Weg Wun Dar Wun     0   0   1

I looked over the web but couldn't find any relevant solutions.Please help me with this error. Thank you!

Solution

You cannot pass string to fit() method. Column name needs to be transformed into float. Good method is to use: sklearn.preprocessing.LabelEncoder

Given above sample of dataset, here is reproducible example how to perform LabelEncoding:

from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

le = preprocessing.LabelEncoder()
data.name = le.fit_transform(data.name)
X = data.iloc[:, 0:4]
y = data.iloc[:, 5]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)

classifier = LogisticRegression()
classifier.fit(X_train, y_train)

print(classifier.coef_,classifier.intercept_)

resulting model coefficients and intercept:

[[ 0.09253555  0.09253555 -0.15407024  0.        ]] [-0.1015314]