python tensorflow keras loss-function siamese-network

TensorFlow Custom Loss Function Error: Node: 'gradient_tape/contrastive_loss/mul/BroadcastGradientArgs' Incompatible shapes [0,1] vs. [32,1]

I'm working on a Siamese network using TensorFlow and Keras. As the data is huge I am also trying to use the generator for batch loading. I have a custom contrastive loss function that I'm using for training, but I'm facing an error during the gradient calculation.

Error:

Epoch 1/10
y_true shape: (None, 1)
y_pred shape: (None, 1)
loss: Tensor("contrastive_loss/Mean:0", shape=(), dtype=float32)
y_true shape: (None, 1)
y_pred shape: (None, 1)
loss: Tensor("contrastive_loss/Mean:0", shape=(), dtype=float32)
   1/1125 [..............................] - ETA: 7:56:52 - loss: 0.1281 - accuracy: 0.0625
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-17-d373b6e0f1af> in <cell line: 11>()
      9 #val_gen = data_generator(val_pairs, val_labels, batch_size, os.path.join(extract_path, 'train'))
     10 
---> 11 model.fit(
     12     train_gen,
     13     # validation_data=val_gen,

1 frames
/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     51   try:
     52     ctx.ensure_initialized()
---> 53     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     54                                         inputs, attrs, num_outputs)
     55   except core._NotOkStatusException as e:

InvalidArgumentError: Graph execution error:

Detected at node 'gradient_tape/contrastive_loss/mul_1/BroadcastGradientArgs' defined at (most recent call last):
    File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
      return _run_code(code, main_globals, None,
------
------"Error messages condensed"
------
    File "/usr/local/lib/python3.10/dist-packages/keras/src/optimizers/optimizer.py", line 276, in compute_gradients
      grads = tape.gradient(loss, var_list)
Node: 'gradient_tape/contrastive_loss/mul/BroadcastGradientArgs'
Incompatible shapes: [0,1] vs. [32,1]
     [[{{node gradient_tape/contrastive_loss/mul/BroadcastGradientArgs}}]] [Op:__inference_train_function_11308]

Data Generator:

def data_generator(pairs, labels, batch_size, img_dir):
    """
    Generate batches of images and labels for training/validation.

    Parameters:
    - pairs: List of tuples containing left image id, list of candidate right image ids,
             and index of ground truth right image.
    - batch_size: Number of pairs to load in each batch.
    - img_dir: Directory containing the images.

    Yields:
    Batch of images and labels.
    """
    num_samples = len(pairs)

    while True:
        # Shuffle pairs for randomness in each epoch
        # np.random.shuffle(pairs)

        # Create a list of sequence indices
        indices = np.arange(num_samples)

        # Shuffle the indices
        np.random.shuffle(indices)

        # Use the shuffled indices to shuffle the sequences
        pairs = np.array(pairs)
        pairs = pairs[indices]
        pairs = pairs.tolist()

        labels = np.array(labels)
        labels = labels[indices]
        labels = labels.tolist()

        for start_idx in range(0, num_samples, batch_size):
            end_idx = min(start_idx + batch_size, num_samples)
            batch_pairs = pairs[start_idx:end_idx]
            labels = labels[start_idx:end_idx]

            left_images = []
            right_images = []
            # labels = []

            for pair in batch_pairs:
                left_img_id, right_img_id = pair

                # Load left image
                left_img = load_and_preprocess_image(left_img_id, img_dir, 'left')
                left_images.append(left_img)

                # Load right images
                right_img = load_and_preprocess_image(right_img_id, img_dir, 'right')
                right_images.append(right_img)

            # Convert lists to numpy arrays
            left_images = np.array(left_images)
            right_images = np.array(right_images)
            labels = np.array(labels)

            yield [left_images, right_images], labels

Model Training Code:

batch_size = 32
train_gen = data_generator(train_pairs, train_labels, batch_size, os.path.join(extract_path, 'train'))
val_gen = data_generator(val_pairs, val_labels, batch_size, os.path.join(extract_path, 'train'))

model.compile(optimizer='rmsprop', loss=contrastive_loss)
model.fit(
    train_gen,
    validation_data=val_gen,
    steps_per_epoch=len(train_pairs) // batch_size,
    validation_steps=len(val_pairs) // batch_size,
    epochs=10
)

I've verified that both y_true and y_pred have the shape (None, 1) during the forward pass as it can be seen from my print messages output in the above error. I'm unsure why I'm seeing a shape of [32,1] in the error.

Does anyone have any ideas on what might be causing this or how to resolve it?

I have tried using print messages in the loss function to determine the shape, double checked the shapes generated by my generator function which seems right. I expected to find the shape mismatch for y_true and y_pred but it was same. This puzzled me. I looked for solutions throughout internet but couldn't resolve it.

UPDATE: Issue fixed. The issue was with generator method where i was updating the labels internally because of which in the second iteration there were no labels returned and thus the shape mismatch error. The line labels = labels[start_idx:end_idx] should be changed to batch_labels = labels[start_idx:end_idx]

Solution

The issue was with generator method where i was updating the labels internally because of which in the second iteration there were no labels returned and thus the shape mismatch error. The line labels = labels[start_idx:end_idx] should be changed to batch_labels = labels[start_idx:end_idx]