Search code examples
numpyunique

numpy find unique rows (only appeared once)


for example I got many sub-arrays by splitting one array A based on list B:

A = np.array([[1,1,1],
              [2,2,2],
              [2,3,4],
              [5,8,10],
              [5,9,9],
              [7,9,6],
              [1,1,1],
              [2,2,2],
              [9,2,4],
              [9,3,6],
              [10,3,3],
              [11,2,2]])
B = np.array([5,7])
C = np.split(A,B.cumsum()[:-1])
>>>print(C)
>>>array([[1,1,1],
          [1,2,2],
          [2,3,4],
          [5,8,10],
          [5,9,9]]),
   array([[7,9,6],
          [1,1,1],
          [2,2,2],
          [9,2,4],
          [9,3,6],
          [10,3,3],
          [11,2,2]])

How can I find get the rows only appeared once in all the sub-arrays (delete those who appeared twice)? so that I can get the result like: (because [1,1,1] and [2,2,2] appeared twice in C )

>>>array([[2,3,4],
          [5,8,10],
          [5,9,9]]),
   array([[7,9,6],
          [9,2,4],
          [9,3,6],
          [10,3,3],
          [11,2,2]])

Solution

  • You can use np.unique to identify the duplicates:

    _, i, c = np.unique(A, axis=0, return_index=True, return_counts=True)
    
    idx = np.isin(np.arange(len(A)), i[c==1])
    
    out = [a[i] for a,i in zip(np.split(A, B.cumsum()[:-1]),
                               np.split(idx, B.cumsum()[:-1]))]
    

    output:

    [array([[ 2,  3,  4],
            [ 5,  8, 10],
            [ 5,  9,  9]]),
     array([[ 7,  9,  6],
            [ 9,  2,  4],
            [ 9,  3,  6],
            [10,  3,  3],
            [11,  2,  2]])]