Search code examples
pythonimageopencvimage-processingscikit-image

How to remove mammography tag artifacts


I have a mammography image dataset (mini DDSM). These images show letter artifacts indicating left or right mamma and other useless information for my ML model, so I want to curate this dataset before training the model.

In this paper, Preprocessing of Digital Mammogram Image Based on Otsu’s Threshold, they use Otsu's binarization and opening on the mammography to clean the image (page 5 of 10):

Their results

So far, I have coded this:

im = io.imread('/content/drive/MyDrive/TFM/DDSMPNG/ALL2/0.jpg')

# thresholding
thresh = im > filters.threshold_otsu(im)

# opening with a disk structure
disk = morphology.disk(5)
opened = morphology.binary_opening(thresh,disk)

# plotting

plt.figure(figsize=(10, 10))

plt.subplot(131)
plt.imshow(im,cmap='gray')
plt.subplot(132)
plt.imshow(opened,cmap='gray')

plt.imsave('/content/drive/MyDrive/TFM/DDSMPNG/Blackened/0.jpg',opened)

And these are the plots:

Results

I have also tried with a higher disk shape to do the opening, it seems to remove more white of the small letter artifact, but also crops a bit the mammography:

disk = morphology.disk(45)
opened = morphology.binary_opening(thresh,disk)

The result:

Result with disk shape (45,45)

I guess I will have to create some kind of mask with the binarization and apply it to the original image, but I am new to image processing libraries and I'm not sure how to achieve the results

EDIT 1: I tried @fmw42 suggestion and I have some issues with it (I work on Google Colab, dont know If it has something to do...):

First, with the image used as example on your code, it doesn't seem to work propperly, don't know why, I copied your code and just modified the path to the image as well as some subplots to see the results:

# read image
img = cv2.imread('/content/drive/MyDrive/TFM/DDSMPNG/ALL2/0.jpg')
hh, ww = img.shape[:2]

# convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# apply otsu thresholding
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU )[1] 

# apply morphology close to remove small regions
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5,5))
morph = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)

# apply morphology open to separate breast from other regions
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5,5))
morph = cv2.morphologyEx(morph, cv2.MORPH_OPEN, kernel)

# get largest contour
contours = cv2.findContours(morph, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]
big_contour = max(contours, key=cv2.contourArea)

# draw largest contour as white filled on black background as mask
mask = np.zeros((hh,ww), dtype=np.uint8)
cv2.drawContours(mask, [big_contour], 0, 255, cv2.FILLED)

# dilate mask
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (55,55))
mask = cv2.morphologyEx(mask, cv2.MORPH_DILATE, kernel)

# apply mask to image
result = cv2.bitwise_and(img, img, mask=mask)

# save results

cv2.imwrite('/content/drive/MyDrive/TFM/DDSMPNG/Blackened/0.jpg', result)

# show resultls

plt.figure(figsize=(10, 10))

plt.subplot(141)
plt.imshow(thresh,cmap='gray')
plt.subplot(142)
plt.imshow(morph,cmap='gray')
plt.subplot(143)
plt.imshow(mask,cmap='gray')
plt.subplot(144)
plt.imshow(result,cmap='gray')

Results:

enter image description here

Second, for the rest of the images, it seems to work well for most of them, but it crops a bit the breast surface:

enter image description here

In your result image, it seems to be much more smooth, how can I achieve that?

Thanks in advance!

EDIT 2: @fmw42 solution works fine, if someone has the same issue, you only need to play with the kernel sizes of the morphological filters until the image behaves like his results on the answer.

Thank you so much!


Solution

  • Here is one way to process your image in Python/OpenCV.

     - Read the input
     - Convert to grayscale
     - Otsu threshold
     - Morphology processing
     - Get largest contour from external contours
     - Draw all contours as white filled on a black background except the largest as a mask and invert mask
     - Apply the mask to the input image
     - Save the results
    

    Input:

    enter image description here

    import cv2
    import numpy as np
    
    # read image
    img = cv2.imread("mammogram.png")
    hh, ww = img.shape[:2]
    
    # convert to grayscale
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    
    # apply otsu thresholding
    thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU )[1] 
    
    # apply morphology close to remove small regions
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5,5))
    morph = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
    
    # apply morphology open to separate breast from other regions
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5,5))
    morph = cv2.morphologyEx(morph, cv2.MORPH_OPEN, kernel)
    
    # apply morphology dilate to compensate for otsu threshold not getting some areas
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (29,29))
    morph = cv2.morphologyEx(morph, cv2.MORPH_DILATE, kernel)
    
    # get largest contour
    contours = cv2.findContours(morph, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    contours = contours[0] if len(contours) == 2 else contours[1]
    big_contour = max(contours, key=cv2.contourArea)
    big_contour_area = cv2.contourArea(big_contour)
    
    # draw all contours but the largest as white filled on black background as mask
    mask = np.zeros((hh,ww), dtype=np.uint8)
    for cntr in contours:
        area = cv2.contourArea(cntr)
        if area != big_contour_area:
            cv2.drawContours(mask, [cntr], 0, 255, cv2.FILLED)
        
    # invert mask
    mask = 255 - mask
    
    # apply mask to image
    result = cv2.bitwise_and(img, img, mask=mask)
    
    # save results
    cv2.imwrite('mammogram_thresh.jpg', thresh)
    cv2.imwrite('mammogram_morph.jpg', morph)
    cv2.imwrite('mammogram_mask.jpg', mask)
    cv2.imwrite('mammogram_result.jpg', result)
    
    # show resultls
    cv2.imshow('thresh', thresh)
    cv2.imshow('morph', morph)
    cv2.imshow('mask', mask)
    cv2.imshow('result', result)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    

    Thresholded Image:

    enter image description here

    Morphology Processed Image:

    enter image description here

    Mask Image From Contours:

    enter image description here

    Result Image:

    enter image description here

    Alternate

    - Read the input
    - Convert to grayscale
    - Otsu threshold
    - Morphology processing
    - Get largest contour from external contours
    - Draw largest as white filled on black background as a mask 
    - Dilate mask
    - Apply the mask to the input image
    - Save the results
    

    Input:

    import cv2
    import numpy as np
    
    # read image
    img = cv2.imread("mammogram.png")
    hh, ww = img.shape[:2]
    
    # convert to grayscale
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    
    # apply otsu thresholding
    thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU )[1] 
    
    # apply morphology close to remove small regions
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5,5))
    morph = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
    
    # apply morphology open to separate breast from other regions
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5,5))
    morph = cv2.morphologyEx(morph, cv2.MORPH_OPEN, kernel)
    
    # get largest contour
    contours = cv2.findContours(morph, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    contours = contours[0] if len(contours) == 2 else contours[1]
    big_contour = max(contours, key=cv2.contourArea)
    
    # draw largest contour as white filled on black background as mask
    mask = np.zeros((hh,ww), dtype=np.uint8)
    cv2.drawContours(mask, [big_contour], 0, 255, cv2.FILLED)
    
    # dilate mask
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (55,55))
    mask = cv2.morphologyEx(mask, cv2.MORPH_DILATE, kernel)
    
    # apply mask to image
    result = cv2.bitwise_and(img, img, mask=mask)
    
    # save results
    cv2.imwrite('mammogram_thresh.jpg', thresh)
    cv2.imwrite('mammogram_morph2.jpg', morph)
    cv2.imwrite('mammogram_mask2.jpg', mask)
    cv2.imwrite('mammogram_result2.jpg', result)
    
    # show resultls
    cv2.imshow('thresh', thresh)
    cv2.imshow('morph', morph)
    cv2.imshow('mask', mask)
    cv2.imshow('result', result)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    

    Threshold Image:

    enter image description here

    Morphology Processed Image:

    enter image description here

    Mask Image:

    enter image description here

    Result:

    enter image description here

    ADDITION

    Here is the second method of processing applied to your larger JPG image. I noted that it was about 6x in width and height. So I increased the morphology kernels by about 6x from 5 to 31. I also trimmed the image borders 40 pixels all around and then added back a black border of the same amounts.

    Input:

    enter image description here

    import cv2
    import numpy as np
    
    # read image
    img = cv2.imread("mammogram.jpg")
    hh, ww = img.shape[:2]
    
    # convert to grayscale
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    
    # shave 40 pixels all around
    gray = gray[40:hh-40, 40:ww-40]
    
    # add 40 pixel black border all around
    gray = cv2.copyMakeBorder(gray, 40,40,40,40, cv2.BORDER_CONSTANT, value=0)
    
    # apply otsu thresholding
    thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU )[1] 
    
    # apply morphology close to remove small regions
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (31,31))
    morph = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
    
    # apply morphology open to separate breast from other regions
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (31,31))
    morph = cv2.morphologyEx(morph, cv2.MORPH_OPEN, kernel)
    
    # get largest contour
    contours = cv2.findContours(morph, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    contours = contours[0] if len(contours) == 2 else contours[1]
    big_contour = max(contours, key=cv2.contourArea)
    
    # draw largest contour as white filled on black background as mask
    mask = np.zeros((hh,ww), dtype=np.uint8)
    cv2.drawContours(mask, [big_contour], 0, 255, cv2.FILLED)
    
    # dilate mask
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (305,305))
    mask = cv2.morphologyEx(mask, cv2.MORPH_DILATE, kernel)
    
    # apply mask to image
    result = cv2.bitwise_and(img, img, mask=mask)
    
    # save results
    cv2.imwrite('mammogram_thresh.jpg', thresh)
    cv2.imwrite('mammogram_morph2.jpg', morph)
    cv2.imwrite('mammogram_mask2.jpg', mask)
    cv2.imwrite('mammogram_result2.jpg', result)
    
    # show resultls
    cv2.imshow('thresh', thresh)
    cv2.imshow('morph', morph)
    cv2.imshow('mask', mask)
    cv2.imshow('result', result)
    cv2.waitKey(0)
    cv2.destroyAllWindows()
    

    Threshold Image:

    enter image description here

    Morphology Image:

    enter image description here

    Mask Image:

    enter image description here

    Result:

    enter image description here