I'm trying to use .isin with the ~ so I can get a list of unique rows back based on multiple columns in 2 data-sets.
So, I have 2 data-sets with 9 rows: df1 is the bottom and df2 is the top (sorry but I couldn't get it to show both below, it showed 1 then a row of numbers)
Index Serial Count Churn
1 9 5 0
2 8 6 0
3 10 2 1
4 7 4 2
5 7 9 2
6 10 2 2
7 2 9 1
8 9 8 3
9 4 3 5
Index Serial Count Churn
1 10 2 1
2 10 2 1
3 9 3 0
4 8 6 0
5 9 8 0
6 1 9 1
7 10 3 1
8 6 7 1
9 4 8 0
I would like to get a list of rows from df1 which aren't in df2 based on more than 1 column.
For example if I base my search on the columns Serial and Count I wouldn't get Index 1 and 2 back from df1 as it appears in df2 at Index position 6, the same with Index position 4 in df1 as it appears at Index position 2 in df2. The same would apply to Index position 5 in df1 as it is at Index position 8 in df2.
The churn column doesn't really matter.
I can get it to work but based only on 1 column but not on more than 1 column.
df2[~df2.Serial.isin(df1.Serial.values)] kinda does what I want, but only on 1 column. I want it to be based on 2 or more.
Index Serial Count Churn
3 9 3 0
6 1 9 1
7 10 3 1
8 6 7 1
9 4 8 0
One solution is to merge with indicators:
df1 = pd.DataFrame([[10, 2, 0], [9, 4, 1], [9, 8, 1], [8, 6, 1], [9, 8, 1], [1, 9, 1], [10, 3, 1], [6, 7, 1], [4, 8, 1]], columns=['Serial', 'Count', 'Churn'])
df2 = pd.DataFrame([[9, 5, 1], [8, 6, 1], [10, 2, 1], [7, 4, 1], [7, 9, 1], [10, 2, 1], [2, 9, 1], [9, 8, 1], [4, 3, 1]], columns=['Serial', 'Count', 'Churn'])
# merge with indicator on
df_temp = df1.merge(df2[['Serial', 'Count']].drop_duplicates(), on=['Serial', 'Count'], how='left', indicator=True)
res = df_temp.loc[df_temp['_merge'] == 'left_only'].drop('_merge', axis=1)
Output
Serial Count Churn
1 9 4 1
5 1 9 1
6 10 3 1
7 6 7 1
8 4 8 1