Search code examples
pythonpandasindexingseriesdrop

How to delete values from one pandas series that are common to another?


So I have a specific problem that needs to be solved. I need to DELETE elements present in one pandas series (ser1) that are common to another pandas series (ser2).

I have tried a bunch of things that do not work and the closest thing I was able to find was with arrays using np.intersect1d() function. This works to find common values, but when I try to drop indexes that are equal to these values, i get a bunch of mistakes.

I've tried a bunch of other things that did not really work and have been at it for 3 hours now so about to give up.

here are the two series:

ser1 = pd.Series([1, 2, 3, 4, 5])
ser2 = pd.Series([4, 5, 6, 7, 8])

The result should be:

print(ser1)
0   1
1   2
2   3

I am sure there is a simple solution.


Solution

  • A numpy alternative, np.isin

    import pandas as pd
    import numpy as np
    
    ser1 = pd.Series([1, 2, 3, 4, 5])
    ser2 = pd.Series([4, 5, 6, 7, 8])
    
    res = ser1[~np.isin(ser1, ser2)]
    print(res)
    

    Micro-Benchmark

    import pandas as pd
    import numpy as np
    ser1 = pd.Series([1, 2, 3, 4, 5] * 100)
    ser2 = pd.Series([4, 5, 6, 7, 8] * 10)
    %timeit res = ser1[~np.isin(ser1, ser2)]
    136 µs ± 2.56 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    %timeit res = ser1[~ser1.isin(ser2)]
    209 µs ± 1.66 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    %timeit pd.Index(ser1).difference(ser2).to_series()
    277 µs ± 1.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)