Search code examples
pythonnumpyunique

numpy.unique gives non-unique output?


I am trying to get the indices of unique elements of a numpy array (long vector of 3628621 elements). However, I must do something wrong, because when I try to select the unique elements I am still finding duplicates:

Vector
Out[165]: array([712450, 714390, 718560, ..., 384390, 992041,  94852])

Loc = np.where(np.unique(Vector))       # Find indices of unique elements
Vector_New = Vector[Loc]                # Create new vector with all unique elements
np.where(Vector_New == 173020)          # See how often/where '173020' exists
Out[166]: (array([ 7098, 11581], dtype=int64),)

So, the integer '173020' exists still twice in the new vector, although I expected that all elements should be unique. The new vector is 11594 elements long.

Thanks for the help!

Regards, Timen


Solution

  • np.unique has several parameters that can be activated and will give you the needed information. It's calling signature is:

    np.unique(ar, return_index=False, return_inverse=False, return_counts=False)
    

    read the docs.

    In [50]: keys
    Out[50]: 
    array([1, 3, 5, 2, 0, 7, 4, 7, 7, 2, 7, 5, 5, 3, 6, 2, 3, 5, 5, 5, 6, 9, 6,
           5, 2, 1, 6, 6, 5, 9, 9, 6, 5, 5, 9, 9, 6, 3, 7, 0, 5, 1, 7, 6, 2, 4,
           1, 0, 6, 5, 4, 8, 8, 4, 2, 1, 8, 3, 1, 9, 8, 4, 4, 2, 4, 7, 2, 6, 8,
           6, 5, 2, 4, 9, 1, 5, 3, 1, 5, 6, 2, 2, 8, 4, 0, 4, 9, 0, 8, 1, 5, 3,
           1, 3, 7, 1, 5, 8, 5, 8])
    In [51]: np.unique(keys, return_counts=True, return_index=True)
    Out[51]: 
    (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
     array([ 4,  0,  3,  1,  6,  2, 14,  5, 51, 21], dtype=int32),
     array([ 5, 11, 11,  8, 10, 18, 12,  8,  9,  8]))