Search code examples
pythonnumpyuniquesub-array

Numpy unique 2D sub-array


I have 3D numpy array and I want only unique 2D-sub-arrays.

Input:

[[[ 1  2]
  [ 3  4]]

 [[ 5  6]
  [ 7  8]]

 [[ 9 10]
  [11 12]]

 [[ 5  6]
  [ 7  8]]]

Output:

[[[ 1  2]
  [ 3  4]]

 [[ 5  6]
  [ 7  8]]

 [[ 9 10]
  [11 12]]]

I tried convert sub-arrays to string (tostring() method) and then use np.unique, but after transform to numpy array, it deleted last bytes of \x00, so I can't transform it back with np.fromstring().

Example:

import numpy as np
a = np.array([[[1,2],[3,4]],[[5,6],[7,8]],[[9,10],[11,12]],[[5,6],[7,8]]])
b = [x.tostring() for x in a]
print(b)
c = np.array(b)
print(c)
print(np.array([np.fromstring(x) for x in c]))

Output:

[b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00', b'\x05\x00\x00\x00\x06\x00\x00\x00\x07\x00\x00\x00\x08\x00\x00\x00', b'\t\x00\x00\x00\n\x00\x00\x00\x0b\x00\x00\x00\x0c\x00\x00\x00', b'\x05\x00\x00\x00\x06\x00\x00\x00\x07\x00\x00\x00\x08\x00\x00\x00']
[b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04'
 b'\x05\x00\x00\x00\x06\x00\x00\x00\x07\x00\x00\x00\x08'
 b'\t\x00\x00\x00\n\x00\x00\x00\x0b\x00\x00\x00\x0c'
 b'\x05\x00\x00\x00\x06\x00\x00\x00\x07\x00\x00\x00\x08']

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-86-6772b096689f> in <module>()
      5 c = np.array(b)
      6 print(c)
----> 7 print(np.array([np.fromstring(x) for x in c]))

<ipython-input-86-6772b096689f> in <listcomp>(.0)
      5 c = np.array(b)
      6 print(c)
----> 7 print(np.array([np.fromstring(x) for x in c]))

ValueError: string size must be a multiple of element size

I also tried view, but I realy don't know how to use it. Can you help me please?


Solution

  • Using @Jaime's post, to solve our case of finding unique 2D subarrays, I came up with this solution that basically adds a reshaping to the view step -

    def unique2D_subarray(a):
        dtype1 = np.dtype((np.void, a.dtype.itemsize * np.prod(a.shape[1:])))
        b = np.ascontiguousarray(a.reshape(a.shape[0],-1)).view(dtype1)
        return a[np.unique(b, return_index=1)[1]]
    

    Sample run -

    In [62]: a
    Out[62]: 
    array([[[ 1,  2],
            [ 3,  4]],
    
           [[ 5,  6],
            [ 7,  8]],
    
           [[ 9, 10],
            [11, 12]],
    
           [[ 5,  6],
            [ 7,  8]]])
    
    In [63]: unique2D_subarray(a)
    Out[63]: 
    array([[[ 1,  2],
            [ 3,  4]],
    
           [[ 5,  6],
            [ 7,  8]],
    
           [[ 9, 10],
            [11, 12]]])