Is there a method for Keras to read TFRecord datasets without additional data processing measures?

I am a high school student trying to learn the basics of TensorFlow. I am currently building a model with TFRecords input files, the default dataset file type from TensorFlow, that have been compressed from the original raw data. I am currently using a convoluted way of parsing the data into numpy arrays for Keras to interpret it. While Keras is a part of TF, it should be easily able to read TFRecord datasets. Is there any other way for Keras to understand TFRecord files?

I use the _decodeExampleHelper method to prepare the data for training.

def _decodeExampleHelper(example) :
  dataDictionary = {
    'xValues' : tf.io.FixedLenFeature([7], tf.float32),
    'yValues' : tf.io.FixedLenFeature([3], tf.float32)
  }
  # Parse the input tf.Example proto using the data dictionary
  example = tf.io.parse_single_example(example, dataDictionary)
  xValues = example['xValues']
  yValues = example['yValues']
  # The Keras Sequential network will have "dense" as the name of the first layer; dense_input is the input to this layer
  return dict(zip(['dense_input'], [xValues])), yValues

data = tf.data.TFRecordDataset(workingDirectory + 'training.tfrecords')

parsedData = data.map(_decodeExampleHelper)

We can see that the parsedData has the correct dimensions in the following code block.

tmp = next(iter(parsedData))
print(tmp)

This outputs the first set of data in the correct dimensions that Keras should be able to interpret.

({'dense_input': <tf.Tensor: id=273, shape=(7,), dtype=float32, numpy=
array([-0.6065675 , -0.610906  , -0.65771157, -0.41417238,  0.89691925,
        0.7122903 ,  0.27881026], dtype=float32)>}, <tf.Tensor: id=274, shape=(3,), dtype=float32, numpy=array([ 0.        , -0.65868723, -0.27960175], dtype=float32)>)

Here is a very simple model with only two layers and train it with the data I just parsed.

model = tf.keras.models.Sequential(
    [
      tf.keras.layers.Dense(20, activation = 'relu', input_shape = (7,)),
      tf.keras.layers.Dense(3, activation = 'linear'),
    ]
)

model.compile(optimizer = 'adam', loss = 'mean_absolute_error', metrics = ['accuracy'])

model.fit(parsedData, epochs = 1)

The line model.fit(parsedData, epochs = 1) gives an error of ValueError: Error when checking input: expected dense_input to have shape (7,) but got array with shape (1,) despite the dense_input being 7.

What problem could there be in this case? Why can Keras no interpret tensors from the file correctly?

Solution

You need to be batching your data before passing it to Keras and using an Input layer. The following works for me just fine:

import tensorflow as tf

ds = tf.data.Dataset.from_tensors((
    {'dense_input': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7]}, [ 0.0, 0.1, -0.1]))
ds = ds.repeat(32).batch(32)

model = tf.keras.models.Sequential(
    [
      tf.keras.Input(shape=(7,), name='dense_input'),
      tf.keras.layers.Dense(20, activation = 'relu'),
      tf.keras.layers.Dense(3, activation = 'linear'),
    ]
)

model.compile(optimizer = 'adam', loss = 'mean_absolute_error', metrics = ['accuracy'])

model.fit(ds, epochs = 1)