Search code examples
pythonnumpyunique

Numpy.unique on 3d array with axis=2 but not working as expected


Consider the following code, when axis=2, it should remove the duplicate of [1 1] to [1], but not. I wonder why it doesn't do unique operation on the 3rd axis.

arr = np.array([[[1,1], [1,1], [1,1]],
         [[7,1], [10,1], [10,1]],
         [[1,1], [1,1], [1,1]]])

print(np.unique(arr, axis=0))
print("----------------")
print(np.unique(arr, axis=1))
print("----------------")
print(np.unique(arr, axis=2))

I tried with many other examples, and it still not working on the 3rd axis.


Solution

  • Note this from the documentation (citing help(np.unique)):

    The axis to operate on. If None, ar will be flattened. If an integer, the subarrays indexed by the given axis will be flattened and treated as the elements of a 1-D array with the dimension of the given axis […]

    When an axis is specified the subarrays indexed by the axis are sorted. […] The result is that the flattened subarrays are sorted in lexicographic order starting with the first element.

    So in your case it will try to sort and compare the sub-arrays arr[:, :, 0].flatten() which is [ 1, 1, 1, 7, 10, 10, 1, 1, 1] with arr[:, :, 1].flatten() which is [1, 1, 1, 1, 1, 1, 1, 1, 1].

    These are obviously not the same so no change is made except that the second is sorted before the first in a lexicographical comparison.

    I assume what you wanted it to do is getting rid of the duplicate [1, 1] entries. However, np.unique cannot really work that way because these are arrays not lists. That behavior would result in different number of entries in arr[0] compared to arr[1] and that obviously cannot work.