Search code examples
pythonmultithreadingtensorflowpydicommedical-imaging

Unable to load DICOM images using Pydicom inside tf.data.Dataset


I'm trying to make a data generator to load DICOM images in batches to use in model.fit using Tensorflow:

dataset = tf.data.Dataset.from_tensor_slices((file_names, annotations))

dataset = dataset.map(find_path, num_parallel_calls=tf.data.AUTOTUNE)
dataset = dataset.map(read_dicom, num_parallel_calls=tf.data.AUTOTUNE)
dataset = dataset.map(augment_images, num_parallel_calls=tf.data.AUTOTUNE)
dataset = dataset.map(resize_images, num_parallel_calls=tf.data.AUTOTUNE)
dataset = dataset.map(normalize_bbox, num_parallel_calls=tf.data.AUTOTUNE)
dataset = dataset.batch(64, num_parallel_calls=tf.data.AUTOTUNE)
dataset = dataset.prefetch(buffer_size=tf.data.AUTOTUNE)
  • The first function (find_path) convert file name to path.

  • The second function (read_dicom) tries to load dicom images, it's something like this:

    from pydicom import dcmread
    ...
    
    def read_dicom(path, annotation):
        raw_dicom = dcmread(path)
        image = dcm.pixel_data_handlers.util.apply_voi_lut(raw_dicom.pixel_array, raw_dicom)
    
        return image, annotation
    

I get an error in dcmread(path) which say:

TypeError: dcmread: Expected a file path or a file-like, but got SymbolicTensor

I don't completely understand the situation but from what i know the reason for this is that when model.fit running in graph mode, converts every function into graph. This makes every variables coming into functions SymbolicTensor hence paths are SymbolicTensor and can't be used with Pydicom or any other library.


I tried multiple methods to fix this problem but they ether not working or they are not suitable for the project i'm doing:

  1. using tf.py_function to prevent tensorflow from converting read_dicom into graph

    This method works but is not usable for me because i want to use Tensorflow threading to load multiple images simultaneously. if i use tf.py_function it runs python code which has GIL and prevent threads from running at the same time.

  2. using Tensorflow IO library to load the image

    import tensorflow as tf
    import tensorflow_io as tfio
    ...
    
    def read_dicom(path, annotation):
       raw_image = tf.io.read_file(path)
       image = tfio.image.decode_dicom_image(raw_image)
    
       return image, annotation
    

    This method doesn't work as the tfio.image.decode_dicom_image in Tensorflow IO library does not work. You can check out the colab notebook which tensorflow provided in a dicom tutorial, it doesn't work there too!

    https://www.tensorflow.org/io/tutorials/dicom https://colab.research.google.com/github/tensorflow/io/blob/master/docs/tutorials/dicom.ipynb

I should note that i want to use tensorflow built-in multithreading.

Do you have any idea how can i fix this problem?


Solution

  • Actually tfio.image.decode_dicom_image works. You only need to ensure tensorflow-io is compatible with your tensorflow version. See the markdown from here.

    Test run:

    import tensorflow as tf
    import matplotlib.pyplot as plt
    
    file_name = np.array(['dicom_00000001_000.dcm'])
    annotation = np.array([1])
    
    dataset = tf.data.Dataset.from_tensor_slices((file_name, annotation))
    
    def read_dicom(path, annotation):
       raw_image = tf.io.read_file(path)
       image = tfio.image.decode_dicom_image(raw_image)
    
       return image, annotation
    
    dataset = dataset.map(read_dicom)
    
    for x, y in dataset:
        plt.imshow(x.numpy().squeeze(), cmap='gray')
    

    Gives the image that is used in the tutorial: enter image description here