Search code examples
pythonpandaslogicxor

Element-wise XOR in pandas


I know that logical AND is &, and logical OR is | in a Pandas Series, but I was looking for an element-wise logical XOR. I could express it in terms of AND and OR, I suppose, but I'd prefer to use an XOR if one is available.

Thank you!


Solution

  • Python XOR: a ^ b

    Numpy logical XOR: np.logical_xor(a,b)

    Testing performance - result are equal:

    1. Sequence of random booleans with size 10000

    In [7]: a = np.random.choice([True, False], size=10000)
    In [8]: b = np.random.choice([True, False], size=10000)
    
    In [9]: %timeit a ^ b
    The slowest run took 7.61 times longer than the fastest. This could mean that an intermediate result is being cached
    100000 loops, best of 3: 11 us per loop
    
    In [10]: %timeit np.logical_xor(a,b)
    The slowest run took 6.25 times longer than the fastest. This could mean that an intermediate result is being cached
    100000 loops, best of 3: 11 us per loop
    

    2. Sequence of random booleans with size 1000

    In [11]: a = np.random.choice([True, False], size=1000)
    In [12]: b = np.random.choice([True, False], size=1000)
    
    In [13]: %timeit a ^ b
    The slowest run took 21.52 times longer than the fastest. This could mean that an intermediate result is being cached
    1000000 loops, best of 3: 1.58 us per loop
    
    In [14]: %timeit np.logical_xor(a,b)
    The slowest run took 19.45 times longer than the fastest. This could mean that an intermediate result is being cached
    1000000 loops, best of 3: 1.58 us per loop
    

    3. Sequence of random booleans with size 100

    In [15]: a = np.random.choice([True, False], size=100)
    In [16]: b = np.random.choice([True, False], size=100)
    
    In [17]: %timeit a ^ b
    The slowest run took 33.43 times longer than the fastest. This could mean that an intermediate result is being cached
    1000000 loops, best of 3: 614 ns per loop
    
    In [18]: %timeit np.logical_xor(a,b)
    The slowest run took 45.49 times longer than the fastest. This could mean that an intermediate result is being cached
    1000000 loops, best of 3: 616 ns per loop
    

    4. Sequence of random booleans with size 10

    In [19]: a = np.random.choice([True, False], size=10)
    In [20]: b = np.random.choice([True, False], size=10)
    
    In [21]: %timeit a ^ b
    The slowest run took 86.10 times longer than the fastest. This could mean that an intermediate result is being cached
    1000000 loops, best of 3: 509 ns per loop
    
    In [22]: %timeit np.logical_xor(a,b)
    The slowest run took 40.94 times longer than the fastest. This could mean that an intermediate result is being cached
    1000000 loops, best of 3: 511 ns per loop