I'm working on a Siamese network using TensorFlow and Keras. As the data is huge I am also trying to use the generator for batch loading. I have a custom contrastive loss function that I'm using for training, but I'm facing an error during the gradient calculation.
Error:
Epoch 1/10
y_true shape: (None, 1)
y_pred shape: (None, 1)
loss: Tensor("contrastive_loss/Mean:0", shape=(), dtype=float32)
y_true shape: (None, 1)
y_pred shape: (None, 1)
loss: Tensor("contrastive_loss/Mean:0", shape=(), dtype=float32)
1/1125 [..............................] - ETA: 7:56:52 - loss: 0.1281 - accuracy: 0.0625
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-17-d373b6e0f1af> in <cell line: 11>()
9 #val_gen = data_generator(val_pairs, val_labels, batch_size, os.path.join(extract_path, 'train'))
10
---> 11 model.fit(
12 train_gen,
13 # validation_data=val_gen,
1 frames
/usr/local/lib/python3.10/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
51 try:
52 ctx.ensure_initialized()
---> 53 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
54 inputs, attrs, num_outputs)
55 except core._NotOkStatusException as e:
InvalidArgumentError: Graph execution error:
Detected at node 'gradient_tape/contrastive_loss/mul_1/BroadcastGradientArgs' defined at (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
------
------"Error messages condensed"
------
File "/usr/local/lib/python3.10/dist-packages/keras/src/optimizers/optimizer.py", line 276, in compute_gradients
grads = tape.gradient(loss, var_list)
Node: 'gradient_tape/contrastive_loss/mul/BroadcastGradientArgs'
Incompatible shapes: [0,1] vs. [32,1]
[[{{node gradient_tape/contrastive_loss/mul/BroadcastGradientArgs}}]] [Op:__inference_train_function_11308]
Data Generator:
def data_generator(pairs, labels, batch_size, img_dir):
"""
Generate batches of images and labels for training/validation.
Parameters:
- pairs: List of tuples containing left image id, list of candidate right image ids,
and index of ground truth right image.
- batch_size: Number of pairs to load in each batch.
- img_dir: Directory containing the images.
Yields:
Batch of images and labels.
"""
num_samples = len(pairs)
while True:
# Shuffle pairs for randomness in each epoch
# np.random.shuffle(pairs)
# Create a list of sequence indices
indices = np.arange(num_samples)
# Shuffle the indices
np.random.shuffle(indices)
# Use the shuffled indices to shuffle the sequences
pairs = np.array(pairs)
pairs = pairs[indices]
pairs = pairs.tolist()
labels = np.array(labels)
labels = labels[indices]
labels = labels.tolist()
for start_idx in range(0, num_samples, batch_size):
end_idx = min(start_idx + batch_size, num_samples)
batch_pairs = pairs[start_idx:end_idx]
labels = labels[start_idx:end_idx]
left_images = []
right_images = []
# labels = []
for pair in batch_pairs:
left_img_id, right_img_id = pair
# Load left image
left_img = load_and_preprocess_image(left_img_id, img_dir, 'left')
left_images.append(left_img)
# Load right images
right_img = load_and_preprocess_image(right_img_id, img_dir, 'right')
right_images.append(right_img)
# Convert lists to numpy arrays
left_images = np.array(left_images)
right_images = np.array(right_images)
labels = np.array(labels)
yield [left_images, right_images], labels
Model Training Code:
batch_size = 32
train_gen = data_generator(train_pairs, train_labels, batch_size, os.path.join(extract_path, 'train'))
val_gen = data_generator(val_pairs, val_labels, batch_size, os.path.join(extract_path, 'train'))
model.compile(optimizer='rmsprop', loss=contrastive_loss)
model.fit(
train_gen,
validation_data=val_gen,
steps_per_epoch=len(train_pairs) // batch_size,
validation_steps=len(val_pairs) // batch_size,
epochs=10
)
I've verified that both y_true
and y_pred
have the shape (None, 1)
during the forward pass as it can be seen from my print messages output in the above error. I'm unsure why I'm seeing a shape of [32,1]
in the error.
Does anyone have any ideas on what might be causing this or how to resolve it?
I have tried using print messages in the loss function to determine the shape, double checked the shapes generated by my generator function which seems right. I expected to find the shape mismatch for y_true
and y_pred
but it was same. This puzzled me. I looked for solutions throughout internet but couldn't resolve it.
UPDATE: Issue fixed. The issue was with generator method where i was updating the labels
internally because of which in the second iteration there were no labels returned and thus the shape mismatch error. The line labels = labels[start_idx:end_idx]
should be changed to batch_labels = labels[start_idx:end_idx]
The issue was with generator method where i was updating the labels
internally because of which in the second iteration there were no labels returned and thus the shape mismatch error. The line labels = labels[start_idx:end_idx]
should be changed to batch_labels = labels[start_idx:end_idx]