I am new with tensorflow. I am trying to read values from a CSV file and load it as tensorflow dataset. However, when I try to run model.fit, it gives following error- Missing data for input "input_39". You passed a data dictionary with keys ['Age', 'Number', 'Start']. Expected the following keys: ['input_39']
Here is my code-
import numpy as np
import pandas as pd
import tensorflow as tf
input_file='kyphosis.csv'
all_dataset = tf.data.experimental.make_csv_dataset(input_file, batch_size=1,label_name="Kyphosis",num_epochs=1)
model=tf.keras.models.Sequential()
model.add(tf.keras.layers.Input(3))
model.add(tf.keras.layers.Dense(10))
model.add(tf.keras.layers.Dense(1,activation='sigmoid'))
model.compile(optimizer='adam',loss='binary_crossentropy',run_eagerly=True)
model.fit(all_dataset,epochs=10)
Please let me know what I am doing wrong here. Tensorflow version is 2.11.0.
I tried with tf.data.Dataset.from_tensor_slices but getting the same error-
df=pd.read_csv('kyphosis.csv')
X=df.drop('Kyphosis',axis=1)
y=df['Kyphosis']
all_dataset=tf.data.Dataset.from_tensor_slices((X.to_dict(orient='list'),y))
all_dataset = all_dataset.batch(1)
model=tf.keras.models.Sequential()
model.add(tf.keras.layers.Input(3))
model.add(tf.keras.layers.Dense(10))
model.add(tf.keras.layers.Dense(1,activation='sigmoid'))
model.compile(optimizer='adam',loss='binary_crossentropy')
model.fit(all_dataset,epochs=3)
Error- ValueError: Missing data for input "input_41". You passed a data dictionary with keys ['Age', 'Number', 'Start']. Expected the following keys: ['input_41']
tf.data.experimental.make_csv_dataset
returns orderedDict with key as feature names and value as the actual features.
dataset = tf.data.experimental.make_csv_dataset(
'test.csv',label_name='target',
batch_size=1,num_epochs=1)
If you look closely, the features and labels given by the dataset
$ dataset.__iter__().next()
>> (OrderedDict([('sepal length (cm)',
<tf.Tensor: shape=(1,), dtype=float32, numpy=array([5.], dtype=float32)>),
('sepal width (cm)',
<tf.Tensor: shape=(1,), dtype=float32, numpy=array([2.3], dtype=float32)>),
('petal length (cm)',
<tf.Tensor: shape=(1,), dtype=float32, numpy=array([3.3], dtype=float32)>),
('petal width (cm)',
<tf.Tensor: shape=(1,), dtype=float32, numpy=array([1.], dtype=float32)>)]),
<tf.Tensor: shape=(1,), dtype=int32, numpy=array([1])>)
So, you cannot simply pass in this ordered dictionary as input to the model. you can convert it into an interpretable format by writing a pre-processing mapping function
def pre_process(features, labels):
features = tf.stack([value for key, value in features.items()], axis=-1)
return features, labels
dataset = dataset.map(pre_process)
Now if you have a look at the dataset
, it will have features which can be passed into the model
$ dataset.__iter__().next()
> (<tf.Tensor: shape=(1, 4), dtype=float32, numpy=array([[5.1, 3.8, 1.6, 0.2]], dtype=float32)>,
<tf.Tensor: shape=(1,), dtype=int32, numpy=array([0])>)
Now this dataset can be directly passed into the model directly for training.