Search code examples
pythonexcelpydicom

Error while converting DICOM tags to Excel using Python


I am trying to convert and list DICOM tags from .dcm files into Excel (using pydicom), but certain tags are showing errors (Patient's Name, Patient ID etc) during conversion.

Some of the tags are showing 'None' in the Excel file although they contain/show data (SOP Class UID, SOP Instance UID etc) in DICOM format. How can I resolve this?

import xlsxwriter 
import sys 
import pydicom 
import os.path
from pydicom.valuerep import PersonName
keywords = ("Patient's Name",
            "Patient ID",
            "Patient's Birth Date",
            "Patient's Sex",
            "SOP Class UID",
            "SOP Instance UID",
            "Group Length",
            "Manufacturer",
            "Referring Physician's Name",
            "Study ID",
            "Patient Orientation",
            "Series Number",
            "Pixel Data",
            "Group Length",
            "Rows",
            "Columns",
           )

# ...
            
dcm_files = [r"C:\Users\akhil\Downloads\Sample_Dataset\Sample_Dataset\PRASANNA_KUMARI\21_12_2013_11_13_46_AM\IMG-0001-00001.dcm"]   

for dcm_file in dcm_files:
    ds = pydicom.filereader.dcmread(dcm_file)
    workbook = xlsxwriter.Workbook(os.path.basename(dcm_file) + '.xlsx')
    worksheet = workbook.add_worksheet()

    row = 0
    col = 0

    for keyword in keywords:
        value = ds.get(keyword, "None")
        if isinstance(value, list):
            value = ", ".join([str(x) for x in value])
        elif isinstance(value, PersonName):
            value = str(value)
        worksheet.write(row, col, keyword)
        worksheet.write(row + 1, col, value)
        col += 1

workbook.close()

Some tags from the DICOM file:

(0008, 0005) Specific Character Set              CS: 'ISO_IR 100'
(0008, 0016) SOP Class UID                       UI: Secondary Capture Image Storage
(0008, 0018) SOP Instance UID                    UI: 1.2.300.0.7230010.3.1.4.3397350519.8248.1599586949.14
(0008, 0020) Study Date                          DA: '20200908'
(0008, 0021) Series Date                         DA: '20200908'
(0008, 0022) Acquisition Date                    DA: '20200908'
(0008, 0023) Content Date                        DA: '20200908'
(0008, 0030) Study Time                          TM: '155900'
(0008, 0031) Series Time                         TM: '155900'
(0008, 0032) Acquisition Time                    TM: '155900'
(0008, 0033) Content Time                        TM: '155900'
(0008, 0050) Accession Number                    SH: ''
(0008, 0060) Modality                            CS: 'OT'
(0008, 0064) Conversion Type                     CS: ''
(0008, 0070) Manufacturer                        LO: 'SANTESOFT'
(0008, 0090) Referring Physician's Name          PN: ''
(0010, 0000) Group Length                        UL: 48
(0010, 0010) Patient's Name                      PN: 'NO^NAME'
(0010, 0020) Patient ID                          LO: '00000001'
(0010, 0030) Patient's Birth Date                DA: ''
(0010, 0040) Patient's Sex                       CS: ''
(0018, 0000) Group Length                        UL: 14
(0018, 1063) Frame Time                          DS: "33.0"

Solution

  • You are not using the correct keywords here. First, the DICOM keywords do not have the 's part, e.g. its called "Patient Name", not "Patient's Name" (this has been changed in the DICOM standard about 15 years ago or so).
    Second, the keywords do not have spaces, so if you want to use the names with spaces for readabilty, you have to remove them for the lookup, for example:

    keywords = ("Patient Name",
                "Patient ID",
                "Patient Birth Date",
                "Patient Sex",
                "SOP Class UID",
                "SOP Instance UID",
                "Group Length",
                "Manufacturer",
                "Referring Physician Name",
                "Study ID",
                "Patient Orientation",
                "Series Number",
                "Group Length",
                "Rows",
                "Columns",
                )
    
    ...
    
    for dcm_file in dcm_files:
        ds = pydicom.filereader.dcmread(dcm_file)
        ...
        for keyword in keywords:
            dcm_keyword = keyword.replace(' ', '')  # remove the spaces for the lookup
            value = ds.get(dcm_keyword, "None")
    

    Note that I have removed all the apostrophs in the tag names, and I have also removed Pixel Data - converting binary data to a string would not work correctly, and you certainly don't want to display the pixel data in an Excel table.