Search code examples
pythonalgorithmnumpyunique

Is there any function equivalent to np.unique for generic object in Python


np.unique() can return indices of first occurrence, indices to reconstruct, and occurrence count. Is there any function/library that can do the same for any Python object?


Solution

  • Not as such. You can get similar functionality using different classes depending on your needs.

    unique with no extra flags has a similar result to set:

    unique_value = set(x)
    

    collections.Counter simulates return_counts:

    counts = collections.Counter(x)
    unique_values = list(counts.keys())
    unique_counts = list(counts.values())
    

    To mimic return_index, use list.index on a set or Counter. This assumes that the container is a list

    first_indices = [x.index(k) for k in counts]
    

    To simulate return_inverse, we look at how unique is actually implemented. unique sorts the input to get the runs of elements. A similar technique can be acheived via sorted (or in-place list.sort) and itertools.groupby:

    s = sorted(zip(x, itertools.count()))
    inverse = [0] * len(x)
    for i, (k, g) in enumerate(itertools.groupby(s, operator.itemgetter(0))):
        for v in g:
            inverse[v[1]] = i
    

    In fact, the groupby approach encodes all the options:

    s = sorted(zip(x, itertools.count()))
    unique_values = []
    first_indices = []
    unique_counts = []
    inverse = [0] * len(x)
    for i, (k, g) in enumerate(itertools.groupby(s, operator.itemgetter(0))):
        unique_values.append(k)
        count = 1
        v = next(g)
        inverse[v[1]] = i
        first_indices.append(v[0])
        for v in g:
            inverse[v[1]] = i
            count += 1
        unique_counts.append(count)