Distance matrix for images (with custom metric) in scipy/sklearn

Consider the following custom metric between 2 images (note: I know it is terrible, it is an example). For each 3*3 kernel above the images, it looks at the number of pixels matching and computes the ratio for that kernel. Finally, it outputs 1 - max(ratio).

def custom_metric(a, b):
    w, h = a.shape
    ratio = []
    for i in range(w // ksize):
        for j in range(h // ksize):
            kernel_diff = a[i*ksize:(i+1)*ksize, j*ksize:(j+1)*ksize] - b[i*ksize:(i+1)*ksize, j*ksize:(j+1)*ksize]
            ratio.append( np.mean(kernel_diff == 0) )
    return 1 - max(ratio)


a = np.array([
    [1,0,0,0,0,1],
    [0,0,1,0,0,0],
    [0,0,0,0,0,0],
    [0,0,0,0,1,0],
    [0,1,0,0,0,1],
    [0,0,0,0,0,0]
])

b = np.array([
    [0,1,0,0,0,0],
    [0,0,1,0,0,1],
    [1,0,0,0,1,0],
    [0,0,0,0,1,0],
    [0,0,0,1,0,1],
    [0,0,0,1,1,1]
])

print(custom_metric(a, b)) # outputs 1/9 because of the bottom left window

All the functions for computing distance matrices in scipy / sklearn that I have seen take as an input an array of shape (n_samples_X, n_features) like sklearn's pairwise_distances. Is there a function allowing higher dimensional arrays, for example of shape (n_samples_X, width, height) I could call my metric on?

Solution

Try flattening the features onto a single axis. Rather than having (samples, height, width, channels, ...) you'd reshape your data to a 2D array of (samples, height * width * channels * ...). This will work with an arbitrary number of dimensions which you can then supply to sklearn.

In your case, try a_reshaped = a.reshape([len(a), -1]). That'll give you a 2D array of shape (samples, height * width) by moving the height and width axes onto the same dimension.

Demonstrating how to flatten the height and width into a single dimension whilst preserving the contiguity of each patch. Also shows how to run the OP's metric on the flattened array.

#Reshapes array to 1 x features, preserving
#patch contiguity
def special_flatten(arr, k):
    kr, kc = k.tolist()
    nrow = arr.shape[0] // kr
    ncol = arr.shape[1] // kc
    return (
        arr
        .reshape(nrow, kr, ncol, kc)
        .swapaxes(1, 2)
        .reshape(nrow, ncol, kr * kc)
        .reshape(nrow * ncol, kr * kc)
        .reshape(1, -1)
    )

#Computes OP's custom metric for a single sample
# but using the reshaped array
def custom_metric(a, b):
    k_area = k.prod()
    sample_num = 0
    ratio = []
    for patch_num in range(a.shape[1] // k_area):
        start = patch_num * k_area
        end = start + k_area
        
        a_vals = a[sample_num, start:end]
        b_vals = b[sample_num, start:end]
        diff = a_vals - b_vals
        ratio.append( (diff == 0).mean() )
    
    return 1 - max(ratio)

Usage:

a = np.array([
    [1,0,0,0,0,1],
    [0,0,1,0,0,0],
    [0,0,0,0,0,0],
    [0,0,0,0,1,0],
    [0,1,0,0,0,1],
    [0,0,0,0,0,0]
])

b = np.array([
    [0,1,0,0,0,0],
    [0,0,1,0,0,1],
    [1,0,0,0,1,0],
    [0,0,0,0,1,0],
    [0,0,0,1,0,1],
    [0,0,0,1,1,1]
])

k = np.array([3, 3]) #the kernel
a_flat = special_flatten(a, k) #reshape to 1 x feats
b_flat = special_flatten(b, k)
custom_metric(a_flat, b_flat) # 1 / 9

Not required for the above, but some additional test code I used to sanity check the reshape results:

a = np.array([[1.1, 1.2,  2.1, 2.2],
              [1.3, 1.4,  2.3, 2.4],
              
              [3.1, 3.2,  4.1, 4.2],
              [3.3, 3.4,  4.3, 4.4]])

kr = kc = 2
nrow = a.shape[0] // kr
ncol = a.shape[1] // kc
a_r = (
    a
    .reshape(nrow, kr, ncol, kc)
    .swapaxes(1, 2)
    .reshape(nrow, ncol, kr * kc)
    .reshape(nrow * ncol, kr * kc)
    .reshape(1, -1)
)
a_r
# a_r is 1.1 1.2 1.3 1.4  2.1 2.2 2.3 2.4  3.1 3.2 3.3 3.4  4.1 4.2 4.3 4.4

patch_area = kr * kc
for patch_num in range(a_r.size // patch_area):
    print(patch_num, ':', a_r[0, patch_num * patch_area:(patch_num + 1)*patch_area] )

#Yields:
#
# 0 : [1.1 1.2 1.3 1.4]
# 1 : [2.1 2.2 2.3 2.4]
# 2 : [3.1 3.2 3.3 3.4]
# 3 : [4.1 4.2 4.3 4.4]