I am trying to get the indices of unique elements of a numpy array (long vector of 3628621 elements). However, I must do something wrong, because when I try to select the unique elements I am still finding duplicates:
Vector
Out[165]: array([712450, 714390, 718560, ..., 384390, 992041, 94852])
Loc = np.where(np.unique(Vector)) # Find indices of unique elements
Vector_New = Vector[Loc] # Create new vector with all unique elements
np.where(Vector_New == 173020) # See how often/where '173020' exists
Out[166]: (array([ 7098, 11581], dtype=int64),)
So, the integer '173020' exists still twice in the new vector, although I expected that all elements should be unique. The new vector is 11594 elements long.
Thanks for the help!
Regards, Timen
np.unique
has several parameters that can be activated and will give you the needed information. It's calling signature is:
np.unique(ar, return_index=False, return_inverse=False, return_counts=False)
read the docs.
In [50]: keys
Out[50]:
array([1, 3, 5, 2, 0, 7, 4, 7, 7, 2, 7, 5, 5, 3, 6, 2, 3, 5, 5, 5, 6, 9, 6,
5, 2, 1, 6, 6, 5, 9, 9, 6, 5, 5, 9, 9, 6, 3, 7, 0, 5, 1, 7, 6, 2, 4,
1, 0, 6, 5, 4, 8, 8, 4, 2, 1, 8, 3, 1, 9, 8, 4, 4, 2, 4, 7, 2, 6, 8,
6, 5, 2, 4, 9, 1, 5, 3, 1, 5, 6, 2, 2, 8, 4, 0, 4, 9, 0, 8, 1, 5, 3,
1, 3, 7, 1, 5, 8, 5, 8])
In [51]: np.unique(keys, return_counts=True, return_index=True)
Out[51]:
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
array([ 4, 0, 3, 1, 6, 2, 14, 5, 51, 21], dtype=int32),
array([ 5, 11, 11, 8, 10, 18, 12, 8, 9, 8]))