python tensorflow keras tensorflow2.0 tensorflow-datasets

TensorFlow Data not working with multiple input keras model

I'm using TensorFlow Data and keras to construct a pipeline to train my model, and everything is working fine until this moment, using a single input image. It turns out that I need to include 2 new images with individual CNN architectures to be merged in single neural networking, totaling 3 inputs. Let me show the code:

def read_tfrecord(example_proto):
    image_feature_description = {
    'label': tf.io.FixedLenFeature([], tf.int64),
    'full_image': tf.io.FixedLenFeature([], tf.string),
    'hand_0': tf.io.FixedLenFeature([], tf.string),
    'hand_1': tf.io.FixedLenFeature([], tf.string)
    }

    example = tf.io.parse_single_example(example_proto, image_feature_description)
    return read_full_image_and_label(example)

def load_dataset(tf_record_path):
    raw_dataset = tf.data.TFRecordDataset(tf_record_path, num_parallel_reads=AUTOTUNE)

    parsed_dataset = raw_dataset.map(read_tfrecord, num_parallel_calls=tf.data.experimental.AUTOTUNE)
    return parsed_dataset 

def load_data_tfrecord(tfrecord_path, augmentation):
  dataset = load_dataset(tfrecord_path)

  dataset = prepare_for_training(dataset, augmentation)
  return dataset

data_generator['train'] = load_data_tfrecord(train_fns, True)

I follow this sequence to read my TF Records, until this point the code is almost the same as what was working previously, with the addition of the 'hand_0' and 'hand_1' (my new images). To get the images I followed this approach:

def decode_and_resize_img(tf_btstring, img_width, img_height):
  image = tf.image.decode_jpeg(tf_btstring, channels=3)
  image = tf.image.resize(image, [img_width, img_height])

  return image 

def read_full_image_and_label(tf_bytestring):
    image = decode_and_resize_img(tf_bytestring['full_image'], 224, 224)
    hand_0 = decode_and_resize_img(tf_bytestring['hand_0'], 224, 224)
    hand_1 = decode_and_resize_img(tf_bytestring['hand_1'], 224, 224)  

    label = tf.cast(tf_bytestring['label'], tf.int32)

    return {'image_data': image, 'hnd_0': hand_0, 'hnd_1': hand_1}, label

Instead of returning just an image and a label, I created an object to put all my images. In the last step, I prepare the dataset to feed the model:

def prepare_for_training(ds, augmentation=True, shuffle_buffer_size=1000):
    ds.cache()
    ds = ds.repeat()
    ds = ds.batch(batch_size)

    ds = ds.unbatch()
    ds = ds.shuffle(buffer_size=shuffle_buffer_size)
    ds = ds.batch(batch_size)
    ds = ds.prefetch(AUTOTUNE)
    return ds

In the whole architecture, I just changed the return of the image by an object and the TFRecord description. I Also changed the CNN/NN topology, as follow:

    full_body_model = EfficientNetB4(input_shape=(img_width, img_height, 3), weights='imagenet', include_top=False)
    hand_1_model =  EfficientNetB0(input_shape=(224, 224, 3), weights='imagenet', include_top=False)
    hand_2_model =  EfficientNetB0(input_shape=(224, 224, 3), weights='imagenet', include_top=False)

    full_body_model.trainable = False
    hand_1_model.trainable = False
    hand_2_model.trainable = False

    for layer in hand_1_model.layers:
      layer._name = 'hand_1' + str(layer._name)

    for layer in hand_2_model.layers:
      layer._name = 'hand_2' + str(layer._name)

    fbm = full_body_model.output
    fbm = GlobalAveragePooling2D()(fbm)

    h1m = hand_1_model.output
    h1m = GlobalAveragePooling2D()(h1m)
    
    h2m = hand_2_model.output
    h2m = GlobalAveragePooling2D()(h2m)

    x = Concatenate()([fbm, h1m, h2m])
    x = Dense(512, activation='elu', name='fc1')(x)
    x = Dropout(0.4)(x)
    x = Dense(256, activation='elu', name='fc2')(x)
    x = Dropout(0.4)(x)
    x = Dense(128, activation='elu', name='fc3')(x)
    x = Dropout(0.4)(x)
    x = Dense(64, activation='elu', name='fc4')(x)
    x = Dropout(0.4)(x)

    predictions = Dense(nb_classes, activation='softmax', name='predictions')(x)

    model = Model([full_body_model.input, hand_1_model.input, hand_2_model.input], predictions)

Now, I'm facing this error:

ValueError: in user code:

    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:811 train_function  *
        for _ in math_ops.range(self._steps_per_execution):
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/operators/control_flow.py:414 for_stmt
        symbol_names, opts)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/operators/control_flow.py:629 _tf_range_for_stmt
        opts)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/operators/control_flow.py:1059 _tf_while_stmt
        body, get_state, set_state, init_vars, nulls, symbol_names)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/operators/control_flow.py:1032 _try_handling_undefineds
        _verify_loop_init_vars(init_vars, symbol_names, first_iter_vars)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/operators/control_flow.py:193 _verify_loop_init_vars
        raise ValueError(error_msg)

    ValueError: 'outputs' must be defined before the loop.

I've tried other approaches, like here: https://github.com/tensorflow/tensorflow/issues/20698, but I always got the same error. I'd love some help to solve this problem!

I'm using the current version of Google Colab, with TF 2.4.0

Solution

To solve this issue I've changed the way that my model receives the input data. I'm using a custom generator to stack the multi-input images together and pass this list to the fit method. This is my code:

    def train_gen():
      for (stacked, label) in train_ds:
        label = label 
        img1 = stacked[:,0,:,:]    
        img2 = stacked[:,1,:,:]
        img3 = stacked[:,2,:,:]
        img1, img2, img3 = transform_batch(img1, img2, img3) #data augmentation
        yield [img1, img2, img3], label