python python-3.x numpy performance pydicom

Reading the First Frame of a Large Dicom File

Let a large dicom file (for example https://drive.google.com/drive/folders/1ejY0CjfEwS6SGS2qe_uRX2JvlruMKvPX?usp=sharing) be given. I need to read, in numpy array format , the first frame of its pixel array as quickly as possible .

import pydicom

directory = #whatever directory to the file is stored
dicom = pydicom.dcmread(directory)

Now, as mentionned in some other posts, the following line completes the task:

first_image = dicom.pixel_array[0]

But my pixel array is of shape (1691, 555, 800, 3), which means dicom.pixel_array takes like 12 seconds to run. Since I have a lot of such dicom files to read first image, I need to come up with a way that is a lot faster .

My attempt:

I tried use its pixel data dicom[0x7fe0,0x0010]._value, which is in bytes. I wanted to extract the portion of bytes for the first image and then convert it to numpy. But I cannot decide which portion of the pixel data is responsible for the first image. The posts http://dicomiseasy.blogspot.com/2012/08/chapter-12-pixel-data.html and https://groups.google.com/g/dcm4che/c/ZQC2goCadiQ turns out not to be very helpful: the formula ROWS * COLUMNS * NUMBER_OF_FRAMES * SAMPLES_PER_PIXEL * (BITS_ALLOCATED/8) turns out to equal to 1332000 in my case, which does not even divide 122320858 , the pixeldata length .

Solution

As described in this GitHub issue currently there's no solution to this in native pydicom. You can use the highdicom package instead. You can use the ImageFileReader class in the highdicom.io submodule. For the sake of completeness, I report here the example proposed in the documentation to read each frame of a multi-frame dicom file one step at a time:

>>> from pydicom.data import get_testdata_file
>>> from highdicom.io import ImageFileReader

>>> test_filepath = get_testdata_file('eCT_Supplemental.dcm')
>>>
>>> with ImageFileReader(test_filepath) as image:
...     print(image.metadata.SOPInstanceUID)
...     for i in range(image.number_of_frames):
...         frame = image.read_frame(i)
...         print(frame.shape)
1.3.6.1.4.1.5962.1.1.10.3.1.1166562673.14401
(512, 512)
(512, 512)

Since your data don't contain an ICC profile you should run the example with the option correct_color=False in the read_frame function. Furtherly you should comment the first print since the absence of this attribute causes an attribute error in reading metadata. With these changes the example code above that should work on your data looks like this:

>>> with ImageFileReader(test_filepath) as image:
    ...     #print(image.metadata.SOPInstanceUID)
    ...     for i in range(image.number_of_frames):
    ...         frame = image.read_frame(i, correct_color=False)
    ...         print(frame.shape)

For further issues always take a look at the documentation first (here linked).