I have a sample pandas dataframe df:
col1 col2 col3 col4
0 a 1.0 2.0 3
1 b NaN NaN 6
2 c NaN 8.0 9
3 d NaN 11.0 12
4 e 13.0 14.0 15
5 f 17.0 18.0 19
6 g 21.0 22.0 23
and a second one df1:
col1 col2 col3 col4
0 a 1.0 2.0 3
4 e 13.0 14.0 15
5 f 17.0 18.0 19
6 g 21.0 22.0 23
I want to get the subset of df that does not overlaps with df1. In effect I am looking for the equivalent of the EXCEPT operand in SQL.
I used the subtract() function -- but this was clearly wrong, as the subtract performs elementwise numerical subtraction. So I got an error message:
TypeError: unsupported operand type(s) for -: 'str' and 'str'
So the question is: What is the equivalent of EXCEPT in SQL for Pandas?
I think you need set_index
of all string columns first:
df2 = df.set_index('col1').subtract(df1.set_index('col1'), axis='columns')
print (df2)
col2 col3 col4
col1
a 0.0 0.0 0.0
b NaN NaN NaN
c NaN NaN NaN
d NaN NaN NaN
e 0.0 0.0 0.0
f 0.0 0.0 0.0
g 0.0 0.0 0.0
Or:
df2 = df.set_index('col1').subtract(df1.set_index('col1'), axis='columns', fill_value=0)
print (df2)
col2 col3 col4
col1
a 0.0 0.0 0.0
b NaN NaN 6.0
c NaN 8.0 9.0
d NaN 11.0 12.0
e 0.0 0.0 0.0
f 0.0 0.0 0.0
g 0.0 0.0 0.0
EDIT by edited question:
print (df.isin(df1))
col1 col2 col3 col4
0 True True True True
1 False False False False
2 False False False False
3 False False False False
4 True True True True
5 True True True True
6 True True True True
print (df.isin(df1).all(axis=1))
0 True
1 False
2 False
3 False
4 True
5 True
6 True
dtype: bool
print (~df.isin(df1).all(axis=1))
0 False
1 True
2 True
3 True
4 False
5 False
6 False
dtype: bool
print (df[~(df.isin(df1).all(axis=1))])
col1 col2 col3 col4
1 b NaN NaN 6
2 c NaN 8.0 9
3 d NaN 11.0 12