python numpy duplicates distinct data-analysis

How to reduce the number of row repetitions in a numpy array

I want to clean my data reducing the number of duplicates. I do not want to delete ALL duplicates.

How can I get a numpy array with certain number of duplicates?

Suppose, I have

x = np.array([[1,2,3],[1,2,3],[5,5,5],[1,2,3],[1,2,3]])

and I set number of duplicates as 2.

And the output should be like

x
>>[[1,2,3],[1,2,3],[5,5,5]]

x
>>[[5,5,5],[1,2,3],[1,2,3]]

It does not meter in my task

Solution

Even though using list appending as an intermediate step is not always a good idea when you already have numpy arrays, in this case it is by far the cleanest way to do it:

def n_uniques(arr, max_uniques):
    uniq, cnts = np.unique(arr, axis=0, return_counts=True)
    arr_list = []
    for i in range(cnts.size):
        num = cnts[i] if cnts[i] <= max_uniques else max_uniques
        arr_list.extend([uniq[i]] * num)
    return np.array(arr_list)

x = np.array([[1,2,3],
              [1,2,3],
              [1,2,3],
              [5,5,5],
              [1,2,3],
              [1,2,3],])

reduced_arr = n_uniques(x, 2)