I am trying to construct a dataset to use in Keras for the Titanic example on Kaggle. Here is what I've done so far:
train_data = pd.read_csv("/kaggle/input/titanic/train.csv")
all_columns = ['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'] # all the columns names present in the csv
feature_columns = ['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'] # columns that I want to use as features for the training part
train_data = tf.data.experimental.make_csv_dataset(
"/kaggle/input/titanic/train.csv",
batch_size=12,
column_names=all_columns,
select_columns=feature_columns,
label_name='Survived', # name of the 'label' column
na_value="?",
num_epochs=1,
ignore_errors=False)
But when compiling, I get this error:
495 if label_name is not None and label_name not in column_names: 496 raise ValueError("`label_name` provided must be one of the columns.") 497 498 def filename_to_dataset(filename):
ValueError:
label_name
provided must be one of the columns.
But, as you can see label_name value is 'Survived' and it's present in all_columns (there also to column_names)
Any idea ?
Best
Aymeric
label_name
must be included in the select_columns
Try:
train_data = tf.data.experimental.make_csv_dataset(
"/kaggle/input/titanic/train.csv",
batch_size=12,
column_names=all_columns,
select_columns=feature_columns + ['Survived'],
label_name='Survived', # name of the 'label' column
na_value="?",
num_epochs=1,
ignore_errors=False)