Search code examples
python-3.xnumpy-ndarrayset-difference

Hot to get the set difference of two 2d numpy arrays, or equivalent of np.setdiff1d in a 2d array?


Here Get intersecting rows across two 2D numpy arrays they got intersecting rows by using the function np.intersect1d. So i changed the function to use np.setdiff1d to get the set difference but it doesn't work properly. The following is the code.

def set_diff2d(A, B):
    nrows, ncols = A.shape
    dtype={'names':['f{}'.format(i) for i in range(ncols)],
          'formats':ncols * [A.dtype]}
    C = np.setdiff1d(A.view(dtype), B.view(dtype))

    return C.view(A.dtype).reshape(-1, ncols)

The following data is used for checking the issue:

min_dis=400
Xt = np.arange(50, 3950, min_dis)
Yt = np.arange(50, 3950, min_dis)

Xt, Yt = np.meshgrid(Xt, Yt)
Xt[::2] += min_dis/2
# This is the super set
turbs_possible_locs = np.vstack([Xt.flatten(), Yt.flatten()]).T
# This is the subset
subset = turbs_possible_locs[np.random.choice(turbs_possible_locs.shape[0],50, replace=False)]
diffs = set_diff2d(turbs_possible_locs, subset)

diffs is supposed to have a shape of 50x2, but it is not.


Solution

  • Ok, so to fix your issue try the following tweak:

    def set_diff2d(A, B):
        nrows, ncols = A.shape
        dtype={'names':['f{}'.format(i) for i in range(ncols)], 'formats':ncols * [A.dtype]}
        C = np.setdiff1d(A.copy().view(dtype), B.copy().view(dtype))
        return C
    

    The problem was - A after .view(...) was applied was broken in half - so it had 2 tuple columns, instead of 1, like B. I.e. as a consequence of applying dtype you essentially collapsed 2 columns into tuple - which is why you could do the intersection in 1d in the first place.

    Quoting after documentation:

    "

    a.view(some_dtype) or a.view(dtype=some_dtype) constructs a view of the array’s memory with a different data-type. This can cause a reinterpretation of the bytes of memory.

    "

    Src https://numpy.org/doc/stable/reference/generated/numpy.ndarray.view.html

    I think the "reinterpretation" is exactly what happened - hence for the sake of simplicity I would just .copy() the array.

    NB however I wouldn't square it - it's always A which gets 'broken' - whether it's an assignment, or inline B is always fine...