Search code examples
tensorflowtfrecord

Is there a method for Keras to read TFRecord datasets without additional data processing measures?


I am a high school student trying to learn the basics of TensorFlow. I am currently building a model with TFRecords input files, the default dataset file type from TensorFlow, that have been compressed from the original raw data. I am currently using a convoluted way of parsing the data into numpy arrays for Keras to interpret it. While Keras is a part of TF, it should be easily able to read TFRecord datasets. Is there any other way for Keras to understand TFRecord files?

I use the _decodeExampleHelper method to prepare the data for training.

def _decodeExampleHelper(example) :
  dataDictionary = {
    'xValues' : tf.io.FixedLenFeature([7], tf.float32),
    'yValues' : tf.io.FixedLenFeature([3], tf.float32)
  }
  # Parse the input tf.Example proto using the data dictionary
  example = tf.io.parse_single_example(example, dataDictionary)
  xValues = example['xValues']
  yValues = example['yValues']
  # The Keras Sequential network will have "dense" as the name of the first layer; dense_input is the input to this layer
  return dict(zip(['dense_input'], [xValues])), yValues

data = tf.data.TFRecordDataset(workingDirectory + 'training.tfrecords')

parsedData = data.map(_decodeExampleHelper)

We can see that the parsedData has the correct dimensions in the following code block.

tmp = next(iter(parsedData))
print(tmp)

This outputs the first set of data in the correct dimensions that Keras should be able to interpret.

({'dense_input': <tf.Tensor: id=273, shape=(7,), dtype=float32, numpy=
array([-0.6065675 , -0.610906  , -0.65771157, -0.41417238,  0.89691925,
        0.7122903 ,  0.27881026], dtype=float32)>}, <tf.Tensor: id=274, shape=(3,), dtype=float32, numpy=array([ 0.        , -0.65868723, -0.27960175], dtype=float32)>)

Here is a very simple model with only two layers and train it with the data I just parsed.

model = tf.keras.models.Sequential(
    [
      tf.keras.layers.Dense(20, activation = 'relu', input_shape = (7,)),
      tf.keras.layers.Dense(3, activation = 'linear'),
    ]
)

model.compile(optimizer = 'adam', loss = 'mean_absolute_error', metrics = ['accuracy'])

model.fit(parsedData, epochs = 1)

The line model.fit(parsedData, epochs = 1) gives an error of ValueError: Error when checking input: expected dense_input to have shape (7,) but got array with shape (1,) despite the dense_input being 7.

What problem could there be in this case? Why can Keras no interpret tensors from the file correctly?


Solution

  • You need to be batching your data before passing it to Keras and using an Input layer. The following works for me just fine:

    import tensorflow as tf
    
    ds = tf.data.Dataset.from_tensors((
        {'dense_input': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7]}, [ 0.0, 0.1, -0.1]))
    ds = ds.repeat(32).batch(32)
    
    model = tf.keras.models.Sequential(
        [
          tf.keras.Input(shape=(7,), name='dense_input'),
          tf.keras.layers.Dense(20, activation = 'relu'),
          tf.keras.layers.Dense(3, activation = 'linear'),
        ]
    )
    
    model.compile(optimizer = 'adam', loss = 'mean_absolute_error', metrics = ['accuracy'])
    
    model.fit(ds, epochs = 1)