I am learning data science and reading other people's scripts. There is this one titanic algorithm (kaggle) has this code to apply the Logistic Regression then supposedly export to a .csv file as suggested in the code. However, it always generates an error message after I run the code. The original script is found here, and the .csv data that's being read into the code is here: train.csv test.csv
From Input[24] to Input[28] are for setting up LogisticRegression. Up to Input[27] the code still runs without error. When running Input[28]:
acc_log = predict_model(X_data, Y_data, logreg, X_test_kaggle, 'submission_Logistic.csv')
I receive an error message:
ValueError: could not convert string to float: 'Q'
I tried to add "try/except" to bypass the error message so the code can continue.
try:
acc_log = predict_model(X_data, Y_data, logreg, X_test_kaggle, 'submission_Logistic.csv')
except ValueError:
pass
This code is a bit too sophisticated for me to debug to see which step goes wrong and where in the file that has the string in place of the desired input for a float. So I would like to ask for help here to better understand this and seek for a proper solution. Thanks.
It looks like you didn't run cell 16 in the notebook link you provided, in which Embarked
values are converted to integers (including the string value Q
, which is throwing the error you're seeing):
Cell 16
# fill the missing values of Embarked feature with the most common occurance
freq_port = train_df.Embarked.dropna().mode()[0]
for dataset in combine:
dataset['Embarked'] = dataset['Embarked'].fillna(freq_port)
train_df[['Embarked', 'Survived']].groupby(['Embarked'], as_index=False).mean().sort_values(by='Survived', ascending=False)
for dataset in combine:
dataset['Embarked'] = dataset['Embarked'].map( {'S': 0, 'C': 1, 'Q': 2} ).astype(int)
train_df.head()
I just ran all the cells in order and the LogisticRegression
section worked fine for me. Try shutting down your notebook and re-running all the cells in the order they appear.
A general data science tip:
When you've already trained a model but your predict()
function is throwing an error, it's helpful to look at the test data you're inputting and try and figure out what's wrong there.
In this case, searching the values in X_test_kaggle
for the string Q
might have revealed the problem was with the Embarked
field, and that could have served as a first breadcrumb in tracking the problem back to its source.