np.unique()
can return indices of first occurrence, indices to reconstruct, and occurrence count. Is there any function/library that can do the same for any Python object?
Not as such. You can get similar functionality using different classes depending on your needs.
unique
with no extra flags has a similar result to set
:
unique_value = set(x)
collections.Counter
simulates return_counts
:
counts = collections.Counter(x)
unique_values = list(counts.keys())
unique_counts = list(counts.values())
To mimic return_index
, use list.index
on a set
or Counter
. This assumes that the container is a list
first_indices = [x.index(k) for k in counts]
To simulate return_inverse
, we look at how unique
is actually implemented. unique
sorts the input to get the runs of elements. A similar technique can be acheived via sorted
(or in-place list.sort
) and itertools.groupby
:
s = sorted(zip(x, itertools.count()))
inverse = [0] * len(x)
for i, (k, g) in enumerate(itertools.groupby(s, operator.itemgetter(0))):
for v in g:
inverse[v[1]] = i
In fact, the groupby
approach encodes all the options:
s = sorted(zip(x, itertools.count()))
unique_values = []
first_indices = []
unique_counts = []
inverse = [0] * len(x)
for i, (k, g) in enumerate(itertools.groupby(s, operator.itemgetter(0))):
unique_values.append(k)
count = 1
v = next(g)
inverse[v[1]] = i
first_indices.append(v[0])
for v in g:
inverse[v[1]] = i
count += 1
unique_counts.append(count)