I'm trying to find the Euclidean distance between two images for a K-nearest neighbor algorithm. However, upon exploring some distance functions, I'm facing this discrepancy.
norm1 = np.sqrt(np.sum(np.square(image1-image2))))
norm2 = np.linalg.norm(image1-image2)
Both of these lines seem to be giving different results. Upon trying the same thing with simple 3D Numpy arrays, I seem to get the same results, but with my images, the answers are different. I'm not sure which one is the correct one to use so, any help is welcome, thanks in advance!
Indeed, the two gives different results in your case while the approach are mathematically equal. This is because image1
and image2
are likely of the type uint8
and np.square
does not cast the result to a bigger type. This means using np.square
gives simply wrong results because of overflows. In fact, the subtraction already gives wrong results... You need to cast the input to a bigger type so to avoid overflows. Here is an example:
norm1 = np.sqrt(np.sum(np.square(image1.astype(np.int32)-image2.astype(np.int32))))
norm2 = np.linalg.norm(image1.astype(np.int32)-image2.astype(np.int32))
With that, you should get almost the same result (possibly with a difference of few ULPs that should be negligible here).
Note that np.linalg.norm
is likely significantly faster because it should not create temporary arrays as opposed to np.sqrt
+np.sum
+np.square
.