Search code examples
tensorflowmachine-learning

model.fit gives error with tensorflow dataset created with tf.data.experimental.make_csv_dataset


I am new with tensorflow. I am trying to read values from a CSV file and load it as tensorflow dataset. However, when I try to run model.fit, it gives following error- Missing data for input "input_39". You passed a data dictionary with keys ['Age', 'Number', 'Start']. Expected the following keys: ['input_39']

Here is my code-

import numpy as np
import pandas as pd
import tensorflow as tf

input_file='kyphosis.csv'

all_dataset = tf.data.experimental.make_csv_dataset(input_file, batch_size=1,label_name="Kyphosis",num_epochs=1)

model=tf.keras.models.Sequential()
model.add(tf.keras.layers.Input(3))
model.add(tf.keras.layers.Dense(10))
model.add(tf.keras.layers.Dense(1,activation='sigmoid'))

model.compile(optimizer='adam',loss='binary_crossentropy',run_eagerly=True)

model.fit(all_dataset,epochs=10)

Please let me know what I am doing wrong here. Tensorflow version is 2.11.0.

I tried with tf.data.Dataset.from_tensor_slices but getting the same error-

df=pd.read_csv('kyphosis.csv')
X=df.drop('Kyphosis',axis=1)
y=df['Kyphosis']

all_dataset=tf.data.Dataset.from_tensor_slices((X.to_dict(orient='list'),y))
all_dataset = all_dataset.batch(1)

model=tf.keras.models.Sequential()
model.add(tf.keras.layers.Input(3))
model.add(tf.keras.layers.Dense(10))
model.add(tf.keras.layers.Dense(1,activation='sigmoid'))

model.compile(optimizer='adam',loss='binary_crossentropy')
model.fit(all_dataset,epochs=3)

Error- ValueError: Missing data for input "input_41". You passed a data dictionary with keys ['Age', 'Number', 'Start']. Expected the following keys: ['input_41']


Solution

  • tf.data.experimental.make_csv_dataset returns orderedDict with key as feature names and value as the actual features.

    dataset = tf.data.experimental.make_csv_dataset(
                    'test.csv',label_name='target',
                    batch_size=1,num_epochs=1)
    

    If you look closely, the features and labels given by the dataset

    $ dataset.__iter__().next()
    >> (OrderedDict([('sepal length (cm)',
                   <tf.Tensor: shape=(1,), dtype=float32, numpy=array([5.], dtype=float32)>),
                  ('sepal width (cm)',
                   <tf.Tensor: shape=(1,), dtype=float32, numpy=array([2.3], dtype=float32)>),
                  ('petal length (cm)',
                   <tf.Tensor: shape=(1,), dtype=float32, numpy=array([3.3], dtype=float32)>),
                  ('petal width (cm)',
                   <tf.Tensor: shape=(1,), dtype=float32, numpy=array([1.], dtype=float32)>)]),
     <tf.Tensor: shape=(1,), dtype=int32, numpy=array([1])>)
    

    So, you cannot simply pass in this ordered dictionary as input to the model. you can convert it into an interpretable format by writing a pre-processing mapping function

    def pre_process(features, labels):
        features = tf.stack([value for key, value in features.items()], axis=-1)
        return features, labels
    
    dataset = dataset.map(pre_process)
    

    Now if you have a look at the dataset, it will have features which can be passed into the model

    $ dataset.__iter__().next()
    > (<tf.Tensor: shape=(1, 4), dtype=float32, numpy=array([[5.1, 3.8, 1.6, 0.2]], dtype=float32)>,
     <tf.Tensor: shape=(1,), dtype=int32, numpy=array([0])>)
    

    Now this dataset can be directly passed into the model directly for training.