I am pretty sure that ~
in Pandas is boolean not
. I found a couple of StackOverflow questions / answers, but no pointer to official documentation.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pandas as pd
df = pd.DataFrame([(1, 2, 1),
(1, 2, 2),
(1, 2, 3),
(4, 1, 612),
(4, 1, 612),
(4, 1, 1),
(3, 2, 1),
],
columns=['groupid', 'a', 'b'],
index=['India', 'France', 'England', 'Germany', 'UK', 'USA',
'Indonesia'])
print(df)
filtered = df[~(df['a'] == 2)]
print(filtered)
The df is
groupid a b
India 1 2 1
France 1 2 2
England 1 2 3
Germany 4 1 612
UK 4 1 612
USA 4 1 1
Indonesia 3 2 1
and filtered
is
groupid a b
Germany 4 1 612
UK 4 1 612
USA 4 1 1
So I'm pretty sure it is boolean not.
The ~
is the operator equivalent of the __invert__
dunder which has been overridden explicitly for the purpose performing vectorized logical inversions on pd.DataFrame
/pd.Series
objects.
s = pd.Series([True, False])
~s
0 False
1 True
dtype: bool
s.__invert__()
0 False
1 True
dtype: bool
Note: Dunder methods must not be used directly in code, always prefer the use of the operators.
Also, since you've asked, the section on Boolean Indexing describes its use.
Another common operation is the use of boolean vectors to filter the data. The operators are:
|
foror
,&
forand
, and~
fornot
. These must be grouped by using parentheses.
Bold emphasis mine.