Search code examples
pythongpuuniqueaxiscupy

Is there a CuPy version supporting (axis) option in cupy.unique() function? Any workaround?


I'm looking for a GPU CuPy counterpart of numpy.unique() with axis option supported.

I have a Cupy 2D array that I need to remove its duplicated rows. Unfortunately, cupy.unique() function flattens the array and returns 1D array with unique values. I'm looking for a function like numpy.unique(arr, axis=0) to solve this but CuPy does not support the (axis) option yet

x = cp.array([[1,2,3,4], [4,5,6,7], [1,2,3,4], [10,11,12,13]])
y = cp.unique(x)
y_r = np.unique(cp.asnumpy(x), axis=0)

print('The 2D array:\n', x)
print('Required:\n', y_r, 'But using CuPy')
print('The flattened unique array:\n', y)

print('Error producing line:', cp.unique(x, axis=0))

I expect a 2D array with unique rows but I get a 1D array with unique numbers instead. Any ideas about how to implement this with CuPy or numba?

Solution

  • As of CuPy version 8.0.0b2, the function cupy.lexsort is correctly implemented. This function can be used as a workaround (albeit probably not the most efficient) for cupy.unique with the axis argument.

    Assuming the array is 2D, and that you want to find the unique elements along axis 0 (else transpose/swap as appropriate):

        ###################################
        # replacement for numpy.unique with option axis=0
        ###################################
    
        def cupy_unique_axis0(array):
            if len(array.shape) != 2:
                raise ValueError("Input array must be 2D.")
            sortarr     = array[cupy.lexsort(array.T[::-1])]
            mask        = cupy.empty(array.shape[0], dtype=cupy.bool_)
            mask[0]     = True
            mask[1:]    = cupy.any(sortarr[1:] != sortarr[:-1], axis=1)
            return sortarr[mask]
    

    Check the original cupy.unique source code (which this is based on) if you want to implement the return_stuff arguments, too. I don't need those myself.