Search code examples
pythonnumpyimage-processingmean-square-error

Calculating MSE between numpy arrays


The Scientific Question:

I have lots of 3D volumes all with a cylinder in them orientated with the cylinder 'upright' on the z axis. The volumes containing the cylinder are incredibly noisy, like super noisy you can't see the cylinder in them as a human. If I average together 1000s of these volumes I can see the cylinder. Each volume contains a copy of the cylinder but in a few cases the cylinder may not be orientated correctly so I want a way of figuring this out.

The Solution I have come up with:

I have taken the averaged volume and projected it down the z and x axis (just projecting the numpy array) so that I get a nice circle in one direction and a rectangle in the other. I then take each 3D volume and project every single one down the Z axis. The SNR is still so bad that I cannot see a circle but if I average the 2D slices I can begin to see a circle after averaging a few hundred and it is easy to see after the first 1000 are averaged. To calculate a score of how each volume I figured calculating the MSE of the 3D volumes projected down z against three other arrays, the first would be the average projected down Z, then the average projected down y or x, and finally an array with a normal distribution of noise in it.

Currently I have the following where RawParticle is the 3D data and Ave is the average:

def normalise(array):
    min = np.amin(array)
    max = np.amax(array)
    normarray = (array - min) / (max - min)

    return normarray

def Noise(mag):
    NoiseArray = np.random.normal(0, mag, size=(200,200,200))
    return NoiseArray

#3D volume (normally use a for loop to iterate through al particles but for this example just showing one)
RawParticleProjected = np.sum(RawParticle, 0)
RawParticleProjectedNorm = normalise(RawParticleProjected)
#Average
AveProjected = np.sum(Ave, 0)
AveProjectedNorm = normalise(AveProjected)
#Noise Array
NoiseArray = Noise(0.5)
NoiseNorm = normalise(NoiseArray)


#Mean squared error
MSE = (np.square(np.subtract(RawParticleProjectedNorm, AveProjectedNorm))).mean()

I then repeat this with the Ave summed down axis 1 and then again compared the Raw particle to the Noise array.

However my output from this gives highest MSE when I am comparing the projections that should both be circles as shown below:

enter image description here

My understanding of MSE is that the other two populations should have high MSE and my populations that agree should have low MSE. Perhaps my data is too noisy for this type of analysis? but if that is true then I don't really know how to do what I am doing.

If anyone could glance at my code or enlighten my understanding of MSE I would be super appreciative.

Thank you for taking the time to look and read.


Solution

  • If I understood your question correctly you want to figure how close your different samples are to the average. And by comparing the samples you want to find the outliers which contain a disoriented cylinder. This fits pretty well to the definition of the L2 norm, so MSE should work here.

    I would calculate the average 3D image of all samples and than compute the distance of each sample to this average. Then I would just compare those values.

    The idea of comparing the samples to an image of artificial noise is not bad, but I am not sure if a normal distribution and your normalization work out as you planned. I could be apple and oranges. And I don't think it is a good idea to look at projections along different axis, just compare the 3D images.

    I made some small tests with a circle in 2D with a parameter alpha which indicates how much noise and how much circle there is in a picture. (alpha=0 means only noise, alpha=1 means only circle`)

    import numpy as np
    import matplotlib.pyplot as plt
    
    grid_size = 20
    radius = 5
    mag = 1
    
    def get_circle_stencil(radius):
        xx, yy = np.meshgrid(np.linspace(-grid_size/2+1/2, grid_size/2-1/2, grid_size),
                             np.linspace(-grid_size/2+1/2, grid_size/2-1/2, grid_size))
        dist = np.sqrt(xx**2 + yy**2)
        inner = dist < (radius - 1/2)
        return inner.astype(float)
    
    def create_noise(mag, n_dim=2):
        # return np.random.normal(0, mag, size=(grid_size,)*n_dim)
        return np.random.uniform(0, mag, size=(grid_size,)*n_dim)
    
    def create_noisy_sample(alpha, n_dim=2):
        return (np.random.uniform(0, 1-alpha, size=(grid_size,)*n_dim) + 
                alpha*get_circle_stencil(radius))
    
    
    fig = plt.figure()
    ax = fig.subplots(nrows=3, ncols=3)
    np.unravel_index(3, shape=(3, 3))
    alpha_list = np.arange(9) / 10
    for i, alpha in enumerate(alpha_list):
        r, c = np.unravel_index(i, shape=(3, 3))
        ax[r][c].imshow(*norm(create_noisy_sample(alpha=alpha)), cmap='Greys')
        ax[r][c].set_title(f"alpha={alpha}")
        ax[r][c].xaxis.set_ticklabels([])
        ax[r][c].yaxis.set_ticklabels([])
    

    Comparison of noise and circle for different alpha values

    Than I tried some metrics (mse, cosine similarity and binary cross entropy and looked how they behave for different values of alpha.

    def normalize(*args):
        return [a / np.linalg.norm(a) for a in args]
    
    def cosim(a, b):
        return np.sum(a * b)
    
    def mse(a, b):
        return np.sqrt(np.sum((a-b)**2))
    
    def bce(a, b):
        # binary cross entropy implemented from tensorflow / keras
        eps = 1e-7
        res = a * np.log(b + eps)
        res += (1 - a) * np.log(1 - b + eps)
        return np.mean(-res)
    

    I compared NoiseA-NoiseB, Circle-Circle, Circle-Noise, Noise-Sample, Circle-Sample

    alpha = 0.1
    noise = create_noise(mag=1, grid_size=grid_size)
    noise_b = create_noise(mag=1, grid_size=grid_size)
    circle_reference = get_circle_stencil(radius=radius, grid_size=grid_size)
    sample = create_noise(mag=1, grid_size=grid_size) + alpha * circle_reference
    
    print('NoiseA-NoiseB:', mse(*norm(noise, noise_b)))    # 0.718
    print('Circle-Circle:', mse(*norm(circle, circle)))    # 0.000
    print('Circle-Noise:', mse(*norm(circle, noise)))      # 1.168
    print('Noise-Sample:', mse(*norm(noise, sample)))      # 0.697
    print('Circle-Sample:', mse(*norm(circle, sample)))    # 1.100
    
    print('NoiseA-NoiseB:', cosim(*norm(noise, noise_b)))  # 0.741
    print('Circle-Circle:', cosim(*norm(circle, circle)))  # 1.000
    print('Circle-Noise:', cosim(*norm(circle, noise)))    # 0.317
    print('Noise-Sample:', cosim(*norm(noise, sample)))    # 0.757
    print('Circle-Sample:', cosim(*norm(circle, sample)))  # 0.393
    
    print('NoiseA-NoiseB:', bce(*norm(noise, noise_b)))    # 0.194
    print('Circle-Circle:', bce(*norm(circle, circle)))    # 0.057
    print('Circle-Noise:', bce(*norm(circle, noise)))      # 0.111
    print('Noise-Circle:', bce(*norm(noise, circle)))      # 0.636
    print('Noise-Sample:', bce(*norm(noise, sample)))      # 0.192
    print('Circle-Sample:', bce(*norm(circle, sample)))    # 0.104
    
    n = 1000
    ns = np.zeros(n)
    cs = np.zeros(n)
    for i, alpha in enumerate(np.linspace(0, 1, n)):
        sample = create_noisy_sample(alpha=alpha)
        ns[i] = mse(*norm(noise, sample))
        cs[i] = mse(*norm(circle, sample))
    
    fig, ax = plt.subplots()
    ax.plot(np.linspace(0, 1, n), ns, c='b', label='noise-sample')
    ax.plot(np.linspace(0, 1, n), cs, c='r', label='circle-sample')
    ax.set_xlabel('alpha')
    ax.set_ylabel('mse')
    ax.legend()
    

    Comparison of mse, cosim, and bce for varying alphas

    For your problem I would just look at the comparison circle-sample (red). Different samples will behave as if they have different alpha values and you can group them accordingly. And you should be able to detect outliers because they should have an higher mse.

    You said you have to combine 100-1000 pictures to see the cylinder, which indictas you have an really small alpha value in your problem, but in average mse should work.