Search code examples
pythonnumpydictionarysetinterception

Python intersection of arrays in dictionary


I have dictionary of arrays as like:

y_dict= {1: np.array([5, 124, 169, 111, 122, 184]),
         2: np.array([1, 2, 3, 4, 5, 6, 111, 184]), 
         3: np.array([169, 5, 111, 152]), 
         4: np.array([0, 567, 5, 78, 90, 111]),
         5: np.array([]),
         6: np.array([])}

I need to find interception of arrays in my dictionary: y_dict. As a first step I cleared dictionary from empty arrays, as like

dic = {i:j for i,j in y_dict.items() if np.array(j).size != 0}

So, dic has the following view:

dic = { 1: np.array([5, 124, 169, 111, 122, 184]),
        2: np.array([1, 2, 3, 4, 5, 6, 111, 184]), 
        3: np.array([169, 5, 111, 152]), 
        4: np.array([0, 567, 5, 78, 90, 111])}

To find interception I tried to use tuple approach as like:

result_dic = list(set.intersection(*({tuple(p) for p in v} for v in dic.values())))

Actual result is empty list: [];

Expected result should be: [5, 111]

Could you please help me to find intersection of arrays in dictionary? Thanks


Solution

  • The code you posted is overcomplex and wrong because there's one extra inner iteration that needs to go. You want to do:

    result_dic = list(set.intersection(*(set(v) for v in dic.values())))
    

    or with map and without a for loop:

    result_dic = list(set.intersection(*(map(set,dic.values()))))
    

    result

    [5, 111]
    
    • iterate on the values (ignore the keys)
    • convert each numpy array to a set (converting to tuple also works, but intersection would convert those to sets anyway)
    • pass the lot to intersection with argument unpacking

    We can even get rid of step 1 by creating sets on every array and filtering out the empty ones using filter:

    result_dic = list(set.intersection(*(filter(None,map(set,y_dict.values())))))
    

    That's for the sake of a one-liner, but in real life, expressions may be decomposed so they're more readable & commentable. That decomposition may also help us to avoid the crash which occurs when passed no arguments (because there were no non-empty sets) which defeats the smart way to intersect sets (first described in Best way to find the intersection of multiple sets?).

    Just create the list beforehand, and call intersection only if the list is not empty. If empty, just create an empty set instead:

    non_empty_sets = [set(x) for x in y_dict.values() if x.size]
    result_dic = list(set.intersection(*non_empty_sets)) if non_empty_sets else set()