Difference between np.linalg.norm(a-b) and np.sqrt(np.sum(np.square(a-b)))?

I'm trying to find the Euclidean distance between two images for a K-nearest neighbor algorithm. However, upon exploring some distance functions, I'm facing this discrepancy.

norm1 = np.sqrt(np.sum(np.square(image1-image2))))
norm2 = np.linalg.norm(image1-image2)

Both of these lines seem to be giving different results. Upon trying the same thing with simple 3D Numpy arrays, I seem to get the same results, but with my images, the answers are different. I'm not sure which one is the correct one to use so, any help is welcome, thanks in advance!

Solution

Indeed, the two gives different results in your case while the approach are mathematically equal. This is because image1 and image2 are likely of the type uint8 and np.square does not cast the result to a bigger type. This means using np.square gives simply wrong results because of overflows. In fact, the subtraction already gives wrong results... You need to cast the input to a bigger type so to avoid overflows. Here is an example:

norm1 = np.sqrt(np.sum(np.square(image1.astype(np.int32)-image2.astype(np.int32))))
norm2 = np.linalg.norm(image1.astype(np.int32)-image2.astype(np.int32))

With that, you should get almost the same result (possibly with a difference of few ULPs that should be negligible here).

Note that np.linalg.norm is likely significantly faster because it should not create temporary arrays as opposed to np.sqrt+np.sum+np.square.