I have a mammography image dataset (mini DDSM). These images show letter artifacts indicating left or right mamma and other useless information for my ML model, so I want to curate this dataset before training the model.
In this paper, Preprocessing of Digital Mammogram Image Based on Otsu’s Threshold, they use Otsu's binarization and opening on the mammography to clean the image (page 5 of 10):
So far, I have coded this:
im = io.imread('/content/drive/MyDrive/TFM/DDSMPNG/ALL2/0.jpg')
# thresholding
thresh = im > filters.threshold_otsu(im)
# opening with a disk structure
disk = morphology.disk(5)
opened = morphology.binary_opening(thresh,disk)
# plotting
plt.figure(figsize=(10, 10))
plt.subplot(131)
plt.imshow(im,cmap='gray')
plt.subplot(132)
plt.imshow(opened,cmap='gray')
plt.imsave('/content/drive/MyDrive/TFM/DDSMPNG/Blackened/0.jpg',opened)
And these are the plots:
I have also tried with a higher disk shape to do the opening, it seems to remove more white of the small letter artifact, but also crops a bit the mammography:
disk = morphology.disk(45)
opened = morphology.binary_opening(thresh,disk)
The result:
Result with disk shape (45,45)
I guess I will have to create some kind of mask with the binarization and apply it to the original image, but I am new to image processing libraries and I'm not sure how to achieve the results
EDIT 1: I tried @fmw42 suggestion and I have some issues with it (I work on Google Colab, dont know If it has something to do...):
First, with the image used as example on your code, it doesn't seem to work propperly, don't know why, I copied your code and just modified the path to the image as well as some subplots to see the results:
# read image
img = cv2.imread('/content/drive/MyDrive/TFM/DDSMPNG/ALL2/0.jpg')
hh, ww = img.shape[:2]
# convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# apply otsu thresholding
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU )[1]
# apply morphology close to remove small regions
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5,5))
morph = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
# apply morphology open to separate breast from other regions
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5,5))
morph = cv2.morphologyEx(morph, cv2.MORPH_OPEN, kernel)
# get largest contour
contours = cv2.findContours(morph, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]
big_contour = max(contours, key=cv2.contourArea)
# draw largest contour as white filled on black background as mask
mask = np.zeros((hh,ww), dtype=np.uint8)
cv2.drawContours(mask, [big_contour], 0, 255, cv2.FILLED)
# dilate mask
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (55,55))
mask = cv2.morphologyEx(mask, cv2.MORPH_DILATE, kernel)
# apply mask to image
result = cv2.bitwise_and(img, img, mask=mask)
# save results
cv2.imwrite('/content/drive/MyDrive/TFM/DDSMPNG/Blackened/0.jpg', result)
# show resultls
plt.figure(figsize=(10, 10))
plt.subplot(141)
plt.imshow(thresh,cmap='gray')
plt.subplot(142)
plt.imshow(morph,cmap='gray')
plt.subplot(143)
plt.imshow(mask,cmap='gray')
plt.subplot(144)
plt.imshow(result,cmap='gray')
Results:
Second, for the rest of the images, it seems to work well for most of them, but it crops a bit the breast surface:
In your result image, it seems to be much more smooth, how can I achieve that?
Thanks in advance!
EDIT 2: @fmw42 solution works fine, if someone has the same issue, you only need to play with the kernel sizes of the morphological filters until the image behaves like his results on the answer.
Thank you so much!
Here is one way to process your image in Python/OpenCV.
- Read the input
- Convert to grayscale
- Otsu threshold
- Morphology processing
- Get largest contour from external contours
- Draw all contours as white filled on a black background except the largest as a mask and invert mask
- Apply the mask to the input image
- Save the results
Input:
import cv2
import numpy as np
# read image
img = cv2.imread("mammogram.png")
hh, ww = img.shape[:2]
# convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# apply otsu thresholding
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU )[1]
# apply morphology close to remove small regions
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5,5))
morph = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
# apply morphology open to separate breast from other regions
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5,5))
morph = cv2.morphologyEx(morph, cv2.MORPH_OPEN, kernel)
# apply morphology dilate to compensate for otsu threshold not getting some areas
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (29,29))
morph = cv2.morphologyEx(morph, cv2.MORPH_DILATE, kernel)
# get largest contour
contours = cv2.findContours(morph, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]
big_contour = max(contours, key=cv2.contourArea)
big_contour_area = cv2.contourArea(big_contour)
# draw all contours but the largest as white filled on black background as mask
mask = np.zeros((hh,ww), dtype=np.uint8)
for cntr in contours:
area = cv2.contourArea(cntr)
if area != big_contour_area:
cv2.drawContours(mask, [cntr], 0, 255, cv2.FILLED)
# invert mask
mask = 255 - mask
# apply mask to image
result = cv2.bitwise_and(img, img, mask=mask)
# save results
cv2.imwrite('mammogram_thresh.jpg', thresh)
cv2.imwrite('mammogram_morph.jpg', morph)
cv2.imwrite('mammogram_mask.jpg', mask)
cv2.imwrite('mammogram_result.jpg', result)
# show resultls
cv2.imshow('thresh', thresh)
cv2.imshow('morph', morph)
cv2.imshow('mask', mask)
cv2.imshow('result', result)
cv2.waitKey(0)
cv2.destroyAllWindows()
Thresholded Image:
Morphology Processed Image:
Mask Image From Contours:
Result Image:
Alternate
- Read the input
- Convert to grayscale
- Otsu threshold
- Morphology processing
- Get largest contour from external contours
- Draw largest as white filled on black background as a mask
- Dilate mask
- Apply the mask to the input image
- Save the results
Input:
import cv2
import numpy as np
# read image
img = cv2.imread("mammogram.png")
hh, ww = img.shape[:2]
# convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# apply otsu thresholding
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU )[1]
# apply morphology close to remove small regions
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5,5))
morph = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
# apply morphology open to separate breast from other regions
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5,5))
morph = cv2.morphologyEx(morph, cv2.MORPH_OPEN, kernel)
# get largest contour
contours = cv2.findContours(morph, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]
big_contour = max(contours, key=cv2.contourArea)
# draw largest contour as white filled on black background as mask
mask = np.zeros((hh,ww), dtype=np.uint8)
cv2.drawContours(mask, [big_contour], 0, 255, cv2.FILLED)
# dilate mask
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (55,55))
mask = cv2.morphologyEx(mask, cv2.MORPH_DILATE, kernel)
# apply mask to image
result = cv2.bitwise_and(img, img, mask=mask)
# save results
cv2.imwrite('mammogram_thresh.jpg', thresh)
cv2.imwrite('mammogram_morph2.jpg', morph)
cv2.imwrite('mammogram_mask2.jpg', mask)
cv2.imwrite('mammogram_result2.jpg', result)
# show resultls
cv2.imshow('thresh', thresh)
cv2.imshow('morph', morph)
cv2.imshow('mask', mask)
cv2.imshow('result', result)
cv2.waitKey(0)
cv2.destroyAllWindows()
Threshold Image:
Morphology Processed Image:
Mask Image:
Result:
ADDITION
Here is the second method of processing applied to your larger JPG image. I noted that it was about 6x in width and height. So I increased the morphology kernels by about 6x from 5 to 31. I also trimmed the image borders 40 pixels all around and then added back a black border of the same amounts.
Input:
import cv2
import numpy as np
# read image
img = cv2.imread("mammogram.jpg")
hh, ww = img.shape[:2]
# convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# shave 40 pixels all around
gray = gray[40:hh-40, 40:ww-40]
# add 40 pixel black border all around
gray = cv2.copyMakeBorder(gray, 40,40,40,40, cv2.BORDER_CONSTANT, value=0)
# apply otsu thresholding
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU )[1]
# apply morphology close to remove small regions
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (31,31))
morph = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
# apply morphology open to separate breast from other regions
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (31,31))
morph = cv2.morphologyEx(morph, cv2.MORPH_OPEN, kernel)
# get largest contour
contours = cv2.findContours(morph, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if len(contours) == 2 else contours[1]
big_contour = max(contours, key=cv2.contourArea)
# draw largest contour as white filled on black background as mask
mask = np.zeros((hh,ww), dtype=np.uint8)
cv2.drawContours(mask, [big_contour], 0, 255, cv2.FILLED)
# dilate mask
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (305,305))
mask = cv2.morphologyEx(mask, cv2.MORPH_DILATE, kernel)
# apply mask to image
result = cv2.bitwise_and(img, img, mask=mask)
# save results
cv2.imwrite('mammogram_thresh.jpg', thresh)
cv2.imwrite('mammogram_morph2.jpg', morph)
cv2.imwrite('mammogram_mask2.jpg', mask)
cv2.imwrite('mammogram_result2.jpg', result)
# show resultls
cv2.imshow('thresh', thresh)
cv2.imshow('morph', morph)
cv2.imshow('mask', mask)
cv2.imshow('result', result)
cv2.waitKey(0)
cv2.destroyAllWindows()
Threshold Image:
Morphology Image:
Mask Image:
Result: