Search code examples
pythontensorflowkerastensorflow2.0tensorflow-datasets

TensorFlow Data not working with multiple input keras model


I'm using TensorFlow Data and keras to construct a pipeline to train my model, and everything is working fine until this moment, using a single input image. It turns out that I need to include 2 new images with individual CNN architectures to be merged in single neural networking, totaling 3 inputs. Let me show the code:

def read_tfrecord(example_proto):
    image_feature_description = {
    'label': tf.io.FixedLenFeature([], tf.int64),
    'full_image': tf.io.FixedLenFeature([], tf.string),
    'hand_0': tf.io.FixedLenFeature([], tf.string),
    'hand_1': tf.io.FixedLenFeature([], tf.string)
    }

    example = tf.io.parse_single_example(example_proto, image_feature_description)
    return read_full_image_and_label(example)

def load_dataset(tf_record_path):
    raw_dataset = tf.data.TFRecordDataset(tf_record_path, num_parallel_reads=AUTOTUNE)

    parsed_dataset = raw_dataset.map(read_tfrecord, num_parallel_calls=tf.data.experimental.AUTOTUNE)
    return parsed_dataset 

def load_data_tfrecord(tfrecord_path, augmentation):
  dataset = load_dataset(tfrecord_path)

  dataset = prepare_for_training(dataset, augmentation)
  return dataset

data_generator['train'] = load_data_tfrecord(train_fns, True)

I follow this sequence to read my TF Records, until this point the code is almost the same as what was working previously, with the addition of the 'hand_0' and 'hand_1' (my new images). To get the images I followed this approach:

def decode_and_resize_img(tf_btstring, img_width, img_height):
  image = tf.image.decode_jpeg(tf_btstring, channels=3)
  image = tf.image.resize(image, [img_width, img_height])

  return image 

def read_full_image_and_label(tf_bytestring):
    image = decode_and_resize_img(tf_bytestring['full_image'], 224, 224)
    hand_0 = decode_and_resize_img(tf_bytestring['hand_0'], 224, 224)
    hand_1 = decode_and_resize_img(tf_bytestring['hand_1'], 224, 224)  

    label = tf.cast(tf_bytestring['label'], tf.int32)

    return {'image_data': image, 'hnd_0': hand_0, 'hnd_1': hand_1}, label

Instead of returning just an image and a label, I created an object to put all my images. In the last step, I prepare the dataset to feed the model:

def prepare_for_training(ds, augmentation=True, shuffle_buffer_size=1000):
    ds.cache()
    ds = ds.repeat()
    ds = ds.batch(batch_size)

    ds = ds.unbatch()
    ds = ds.shuffle(buffer_size=shuffle_buffer_size)
    ds = ds.batch(batch_size)
    ds = ds.prefetch(AUTOTUNE)
    return ds

In the whole architecture, I just changed the return of the image by an object and the TFRecord description. I Also changed the CNN/NN topology, as follow:

    full_body_model = EfficientNetB4(input_shape=(img_width, img_height, 3), weights='imagenet', include_top=False)
    hand_1_model =  EfficientNetB0(input_shape=(224, 224, 3), weights='imagenet', include_top=False)
    hand_2_model =  EfficientNetB0(input_shape=(224, 224, 3), weights='imagenet', include_top=False)

    full_body_model.trainable = False
    hand_1_model.trainable = False
    hand_2_model.trainable = False

    for layer in hand_1_model.layers:
      layer._name = 'hand_1' + str(layer._name)

    for layer in hand_2_model.layers:
      layer._name = 'hand_2' + str(layer._name)

    fbm = full_body_model.output
    fbm = GlobalAveragePooling2D()(fbm)

    h1m = hand_1_model.output
    h1m = GlobalAveragePooling2D()(h1m)
    
    h2m = hand_2_model.output
    h2m = GlobalAveragePooling2D()(h2m)

    x = Concatenate()([fbm, h1m, h2m])
    x = Dense(512, activation='elu', name='fc1')(x)
    x = Dropout(0.4)(x)
    x = Dense(256, activation='elu', name='fc2')(x)
    x = Dropout(0.4)(x)
    x = Dense(128, activation='elu', name='fc3')(x)
    x = Dropout(0.4)(x)
    x = Dense(64, activation='elu', name='fc4')(x)
    x = Dropout(0.4)(x)

    predictions = Dense(nb_classes, activation='softmax', name='predictions')(x)

    model = Model([full_body_model.input, hand_1_model.input, hand_2_model.input], predictions)

Now, I'm facing this error:

ValueError: in user code:

    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:811 train_function  *
        for _ in math_ops.range(self._steps_per_execution):
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/operators/control_flow.py:414 for_stmt
        symbol_names, opts)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/operators/control_flow.py:629 _tf_range_for_stmt
        opts)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/operators/control_flow.py:1059 _tf_while_stmt
        body, get_state, set_state, init_vars, nulls, symbol_names)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/operators/control_flow.py:1032 _try_handling_undefineds
        _verify_loop_init_vars(init_vars, symbol_names, first_iter_vars)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/autograph/operators/control_flow.py:193 _verify_loop_init_vars
        raise ValueError(error_msg)

    ValueError: 'outputs' must be defined before the loop.

I've tried other approaches, like here: https://github.com/tensorflow/tensorflow/issues/20698, but I always got the same error. I'd love some help to solve this problem!

I'm using the current version of Google Colab, with TF 2.4.0


Solution

  • To solve this issue I've changed the way that my model receives the input data. I'm using a custom generator to stack the multi-input images together and pass this list to the fit method. This is my code:

        def train_gen():
          for (stacked, label) in train_ds:
            label = label 
            img1 = stacked[:,0,:,:]    
            img2 = stacked[:,1,:,:]
            img3 = stacked[:,2,:,:]
            img1, img2, img3 = transform_batch(img1, img2, img3) #data augmentation
            yield [img1, img2, img3], label