Search code examples
pythoncomputer-visionscikit-imagemahotasglcm

Mahotas library for GLCM calulation and window size


I am using mahotas library to do texture analysis (GLCM) on a satellite image (250 x 200 pixels). GLCM calculations are carried out within a window size. So, for two adjacent positions of the sliding window we need to compute two co-occurrence matrices from scratch. I have read that I can set step size as well in order to avoid computing GLCM on overlapping areas. I have provided the code below.

#Compute haralick features
def haralick_feature(image):
    haralick = mahotas.features.haralick(image, True)
    return haralick


img = 'SAR_image.tif'
win_s=32 #window size
step=32 #step size

rows = img.shape[0]
cols = img.shape[1]
array = np.zeros((rows,cols), dtype= object)
harList = []

for i in range(0, rows-win_s-1, step):
        print 'Row number: ', r
    for j in range(0, cols-win_s-1, step):
        harList.append(haralick_feature(image))

harImages = np.array(harList)     
harImages_mean = harImages.mean(axis=1)

For the code above, I have set the window and step size to 32. When the code finishes, I get an image with dimensions 6 x 8 (instead of 250 x 200), it makes sense as the step size has been set to 32.

So, my question is: By setting a step size (to avoid computation in overlapping regions as well as the code gets faster) can I somehow derive the GLCM results for the whole image with dimensions 250 x 200 instead of having a subset of it (6 x 8 dimensions)? Or I have no option but looping over the image with the normal way (without setting a step size)?


Solution

  • You cannot do that using mahotas as the function which computes the co-occurrence map is not available in this library. An alternative approach to the extraction of texture features from the GLCM would consist in utilizing skimage.feature.graycoprops (check out this thread for details).

    But if you wish to stick to mahotas, you should try using skimage.util.view_as_windows rather than a sliding window as it could speed up the scanning across the image. Please, be sure to read the caveat about memory usage at the end of the documentation. If using view_as_windows is an affordable approach for you, the following code gets the job done:

    import numpy as np
    from skimage import io, util
    import mahotas.features.texture as mht
    
    def haralick_features(img, win, d):
        win_sz = 2*win + 1
        window_shape = (win_sz, win_sz)
        arr = np.pad(img, win, mode='reflect')
        windows = util.view_as_windows(arr, window_shape)
        Nd = len(d)
        feats = np.zeros(shape=windows.shape[:2] + (Nd, 4, 13), dtype=np.float64)
        for m in xrange(windows.shape[0]):
            for n in xrange(windows.shape[1]):
                for i, di in enumerate(d):
                    w = windows[m, n, :, :]
                    feats[m, n, i, :, :] = mht.haralick(w, distance=di)
        return feats.reshape(feats.shape[:2] + (-1,))
    

    DEMO

    For the sample run below I have set win to 19 which corresponds to a window of shape (39, 39). I have considered two different distances. Notice that mht.haralick yields 13 GLCM features for four directions. Altogether this results in a 104-dimensional feature vector for each pixel. When applied to a (250, 200) pixels crop from a Landsat image, feature extraction finishes in around 7 minutes.

    In [171]: img = io.imread('landsat_crop.tif')
    
    In [172]: img.shape
    Out[172]: (250L, 200L)
    
    In [173]: win = 19
    
    In [174]: d = (1, 2)
    
    In [175]: %time feature_map = haralick_features(img, win, d)
    Wall time: 7min 4s
    
    In [176]: feature_map.shape
    Out[176]: (250L, 200L, 104L)
    
    In [177]: feature_map[0, 0, :]
    Out[177]: 
    array([  8.19278030e-03,   1.30863698e+01,   7.64234582e-01, ...,
             3.59561817e+00,  -1.35383606e-01,   8.32570045e-01])
    
    In [178]: io.imshow(img)
    Out[178]: <matplotlib.image.AxesImage at 0xc5a9b38>
    

    Landsat image