Search code examples
pythonmacostensorflowsegmentation-faultdataset

TensorFlow Segmentation Fault on M1 Mac


I am creating a data loading pipeline for image segmentation data. While after the data loads, it goes through an augmentation pipeline via Albumentations.

Here is the full code to the (parent) function: https://pastebin.com/0ZaeWnPz

Here is the source of the error:

    train_ds = tf.data.Dataset.zip((train_images, train_masks))
    val_ds = tf.data.Dataset.zip((validation_images, validation_masks))
    if augment is None:
        return train_ds, val_ds

    @tf.function
    def augment_func(image, mask):
        data = {"image": image, "mask": mask}
        data_aug = UNIVERSAL_AUGMENTATION_FUNCTION(**data)
        img_aug = data_aug["image"]
        mask_aug = data_aug["mask"]
        aug_img = tf.cast(img_aug/255.0, tf.float32)
        aug_mask = tf.cast(mask_aug/255.0, tf.float32)
        return aug_img, aug_mask

    @tf.function
    def process_data(image, mask):
        aug_img, aug_mask = tf.numpy_function( # ERROR OCCURS HERE
            augment_func, inp=[image, mask], Tout=[tf.float32, tf.float32])
        return aug_img, aug_mask

    @tf.function
    def set_shapes(image, mask, img_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), msk_shape=(IMAGE_SIZE, IMAGE_SIZE)):
        image.set_shape(img_shape)
        mask.set_shape(msk_shape)
        return image, mask

    train_ds = train_ds.repeat(augmentation_expansion)
    train_ds = train_ds.map(partial(process_data), num_parallel_calls=tf.data.AUTOTUNE).map(
        set_shapes, num_paralell_calls=tf.data.AUTOTUNE).prefetch(tf.data.AUTOTUNE)

I have seen other SO question and GitHub issues, but it seems that their issue occurs in a different process and different/older version of TensorFlow.

TensorFlow Version: 2.9.2

Python Version: 3.8.13

OS: MacOS 12.5

Mac Type: M1 Pro (I feel it may have to do with this...)

I have tried adding @tf.function decorators as other posts have suggested but this still brings the same error. I have also tried using tf.map_fn() instead of train_ds.map but this did not work either. Can someone please help me and guide me on what I am doing wrong?


Solution

  • augment_func returns two tensors, but by specifying Tout=tf.float32, you tell TF that the function returns only one tensor whose dtype is tf.float32. TF then treats aug_img and aug_mask as two tensors resulting from iterating over that single output tensor in the first dimension. Iterating over a tensor is supported only for Eager Execution, i.e., you can do

    a, b = tf.random.uniform(shape=(2, 100))  # a.shape == b.shape == (100,)
    

    but not

    @tf.function
    def split(x):
        a, b = x
        return a, b
    
    split(tf.random.uniform((2,100)))  # Error: Operator not allowed in graph
    

    because functions decorated with tf.function are executed using Graph Execution instead of Eager Execution.

    So, simply specify Tout=[tf.float32, tf.float32] to match the two returned tensors of augment_func.

    On a not so related note, a map function must always return the transformed data, but your map function of

    def set_shapes(image, mask, img_shape=(IMAGE_SIZE, IMAGE_SIZE, 3), msk_shape=(IMAGE_SIZE, IMAGE_SIZE)):
        image.set_shape(img_shape)
        mask.set_shape(msk_shape)
    

    does not return anything. You need to return image and mask from it as well.