Search code examples
python-3.xscikit-imageimage-preprocessingimage-thresholding

How to automatically select best result from try_all_threshold?


I am applying thresholding on a text-digit based image. Using skimage.filters.try_all_threshold results in 7 of thresholding algorithms getting applied. I am able to get the resut but I am thinking on how I can choose only 1 result to pass the result to next process/dynamically choose 1 best result.


Solution

  • You need to define a measure of similarity between the original image and the binarized images, and then select the thresholding method that maximizes that measure.

    Demo

    The following code simply aims at putting you on the right track. Notice that the function similarity returns a random number rather than a sensible similarity measure. You should implement it on your own or replace it by an appropriate function.

    import numpy as np
    from skimage.data import text
    import skimage.filters
    import matplotlib.pyplot as plt
    
    threshold_methods = [skimage.filters.threshold_otsu,
                         skimage.filters.threshold_yen,
                         skimage.filters.threshold_isodata,
                         skimage.filters.threshold_li,
                         skimage.filters.threshold_mean,
                         skimage.filters.threshold_minimum,
                         skimage.filters.threshold_mean,
                         skimage.filters.threshold_triangle,
                         ]
    
    def similarity(img, threshold_method):
        """Similarity measure between the original image img and and the
        result of applying threshold_method to it.
        """
        return np.random.random()
    
    results = np.asarray([similarity(text(), f) for f in threshold_methods])    
    best_index = np.nonzero(results == results.min())[0][0]    
    best_method = thresholding_methods[best_index]
    threshold = best_method(text())
    binary = text() >= threshold
    
    fig, ax = plt.subplots(1, 1)
    ax.imshow(binary, cmap=plt.cm.gray)
    ax.axis('off')
    ax.set_title(best_method.__name__)
    plt.show(fig)
    

    isodata

    Edit

    Obviously, it makes nonsense to choose the thresholding method randomly (as I did in the toy example above). Instead, you should implement a similarity measure which allows you to automatically select the most efficient algorithm. One possible way to do so would consist in computing the misclassification error, i.e. the percentage of background pixels wrongly assigned to foreground, and conversely, foreground pixels wrongly assigned to background. As the misclassification error is a disimilarity measure rather than a similarity measure, you have to select the method that minimizes that measure like this:

    best_index = np.nonzero(results == results.min())[0][0]
    

    Take a look at this paper for details on this and other approaches to thresholding performance assessment.