Search code examples
numpyvectorizationmatrix-multiplicationsvd

Numpy calculations always differ per function call inside loop for SVD reconstruction


my function that tries to reconstruct an image using SVD with 3 loops sometimes yields different results and tries to calculate the reconstructed image with bad values. Sometimes it works completely fine but if I repeatedly try to reconstruct the same image with the same parameters, it eventually shows bad results. This happens at random and probably not due to variables not being overwritten/deleted correctly after each function call. Sometimes I also get this error message:

/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:21: RuntimeWarning: invalid 
value encountered in add
/usr/local/lib/python3.7/dist-packages/matplotlib/image.py:452: RuntimeWarning: overflow 
encountered in double_scalars
  dv = np.float64(self.norm.vmax) - np.float64(self.norm.vmin)
/usr/local/lib/python3.7/dist-packages/matplotlib/image.py:455: RuntimeWarning: invalid 
value encountered in double_scalars
  newmin = vmid - dv * fact
/usr/local/lib/python3.7/dist-packages/matplotlib/colors.py:1026: RuntimeWarning: 
overflow encountered in double_scalars
  resdat /= (vmax - vmin)

Here is my code which reproduces this bug: https://www.kaggle.com/krajnovic/numpy-variance

I debugged it for quite some time and I think the issue happens at the last summation of the "reco" object.


Solution

  • You didn't paste your relevant code from kaggle here, so I am pasting it here for you.

    Your code:

    import os
    import imageio
    import numpy as np
    import matplotlib.pyplot as plt
    
    im = imageio.imread("/kaggle/input/numpy-variance111/m1-1_slice125.png")
    im = im -im.min() / im.max() - im.min()
    u,s,vt = np.linalg.svd(im, full_matrices=False)
    k = 20
    
    def reconstruct_svd_for_loops3(u,s,vt,k):
        """SVD reconstruction for k components using 3 for-loops
        for + in svd for zeile for spalte
        Inputs:
        u: (m,n) numpy array
        s: (n) numpy array (diagonal matrix)
        vt: (n,n) numpy array
        k: number of reconstructed singular components
        
        Ouput:
        (m,n) numpy array U_mk * S_k * V^T_nk for k reconstructed components
        """
        ### BEGIN SOLUTION
        reco = np.empty(u.shape)
        for i in range(k): #for k components
            sum_ = np.empty(u.shape)
            for j in range(u[:,i].shape[0]): #for each element in ith column of u
                for k in range(vt[i,:].shape[0]): #for each element in ith row of vt
                    sum_[j,k] = u[j,i] * vt[i,k]
            sum_ = s[i] * sum_
            reco += sum_
        ### END SOLUTION
        del sum_
        return reco
    

    Errors and corrections

    I have found some errors in your code, but only the first listed error is the one causing the randomness.

    1. The reason it works sometimes and sometimes not is because you are using np.empty, which initializes your reco and sum_ arrays to garbage values ranging from -inf to inf. Thus sometimes when are lucky you may happen to initialize with values that doesn't crash. You can fix thus by using np.zeros instead.

    2. Additionally, you are specifying incorrect shapes and you have also implemented your outer prdouct (the two inner loops) incorrectly. This does not cause the randomness though, but your reconstruction will be incorrect. See code below. Your code was fine. I misread some things the first time.

    3. Your image normalization is wrong, you are missing some parenthesis'. See code below.

    4. This isn't an error, but a suggestion; you can easily write all of this without any loops by utilizing NumPy broadcasting and matrix multiplication. See code below:

    Implementation of fixes and suggestions

    import numpy as np
    from PIL import Image
    from urllib import request
    from matplotlib import pyplot as plt
    
    url = "https://pbs.twimg.com/profile_images/656159226805399552/v6ffWuIc_400x400.jpg"
    img = np.array(Image.open(request.urlopen(url))) # Image of owl
    
    def normalize_image(X: np.ndarray):
        xmin = X.min()
        xmax = X.max()
        return (X - xmin) / (xmax - xmin)
    
    X = img[..., 0] # Pick a single channel
    X = normalize_image(X)
    
    def reconstruct_loops(u, s, vt, k):
        reco = np.zeros(u.shape)
        for i in range(k): #for k components
            sum_ = np.zeros(u.shape)
            for j in range(u[:,i].shape[0]): #for each element in ith column of u
                for k in range(vt[i,:].shape[0]): #for each element in ith row of vt
                    sum_[j,k] = u[j,i] * vt[i,k]
            sum_ = s[i] * sum_
            reco += sum_
        return reco
    
    def reconstruct_vectorized(u, s, vt, k):
        return (u[:,:k]*s[:k])@vt[:k,:]
    
    k = 10
    u, s, vt = np.linalg.svd(X, full_matrices=False)
    reco_loop = reconstruct_loops(u, s, vt, k)
    reco_vector = reconstruct_vectorized(u, s, vt, k)
    
    fig, axes = plt.subplots(1,3)
    axes[0].set_title("Original")
    axes[0].set_axis_off()
    axes[0].imshow(X, cmap='gray')
    axes[1].set_title("Using\nloops")
    axes[1].set_axis_off()
    axes[1].imshow(normalize_image(reco_loop), cmap='gray')
    axes[2].set_title("Using numpy broadcasting\nand matrix\nmultiplication")
    axes[2].set_axis_off()
    axes[2].imshow(normalize_image(reco_vector), cmap='gray')
    plt.show()
    

    Result: recons