Search code examples
python-3.xpandassetseriesset-union

Python/Pandas element wise union of 2 Series containing sets in each element


I have 2 pandas data Series that I know are the same length. Each Series contains sets() in each element. I want to figure out a computationally efficient way to get the element wise union of these two Series' sets. I've created a simplified version of the code with fake and short Series to play with below. This implementation is a VERY inefficient way of doing this. There has GOT to be a faster way to do this. My real Series are much longer and I have to do this operation hundreds of thousands of times.

import pandas as pd

set_series_1 = pd.Series([{1,2,3}, {'a','b'}, {2.3, 5.4}])
set_series_2 = pd.Series([{2,4,7}, {'a','f','g'}, {0.0, 15.6}])

n = set_series_1.shape[0]  
for i in range(0,n):
    set_series_1[i] = set_series_1[i].union(set_series_2[i])

print set_series_1        
>>> set_series_1
0          set([1, 2, 3, 4, 7])
1             set([a, b, g, f])
2    set([0.0, 2.3, 15.6, 5.4])
dtype: object

I've tried combining the Series into a data frame and using the apply function, but I get an error saying that sets are not supported as dataframe elements.


Solution

  • pir4

    After testing several options, I finally came up with a good one... pir4 below.


    Testing

    def jed1(s1, s2):
        s = s1.copy()
        n = s1.shape[0]
        for i in range(n):
            s[i] = s2[i].union(s1[i])
        return s
    
    def pir1(s1, s2):
        return pd.Series([item.union(s2[i]) for i, item in enumerate(s1.values)], s1.index)
    
    def pir2(s1, s2):
        return pd.Series([item.union(s2[i]) for i, item in s1.iteritems()], s1.index)
    
    def pir3(s1, s2):
        return s1.apply(list).add(s2.apply(list)).apply(set)
    
    def pir4(s1, s2):
        return pd.Series([set.union(*z) for z in zip(s1, s2)])
    

    enter image description here