Search code examples
pythonpandastilde

Where is official documentation for tilde (~) in Pandas?


I am pretty sure that ~ in Pandas is boolean not. I found a couple of StackOverflow questions / answers, but no pointer to official documentation.

Sanity Check

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import pandas as pd


df = pd.DataFrame([(1, 2, 1),
                   (1, 2, 2),
                   (1, 2, 3),
                   (4, 1, 612),
                   (4, 1, 612),
                   (4, 1, 1),
                   (3, 2, 1),
                   ],
                  columns=['groupid', 'a', 'b'],
                  index=['India', 'France', 'England', 'Germany', 'UK', 'USA',
                         'Indonesia'])

print(df)
filtered = df[~(df['a'] == 2)]
print(filtered)

The df is

           groupid  a    b
India            1  2    1
France           1  2    2
England          1  2    3
Germany          4  1  612
UK               4  1  612
USA              4  1    1
Indonesia        3  2    1

and filtered is

         groupid  a    b
Germany        4  1  612
UK             4  1  612
USA            4  1    1

So I'm pretty sure it is boolean not.


Solution

  • The ~ is the operator equivalent of the __invert__ dunder which has been overridden explicitly for the purpose performing vectorized logical inversions on pd.DataFrame/pd.Series objects.

    s = pd.Series([True, False])
    
    ~s
    
    0    False
    1     True
    dtype: bool
    
    s.__invert__()
    
    0    False
    1     True
    dtype: bool
    

    Note: Dunder methods must not be used directly in code, always prefer the use of the operators.

    Also, since you've asked, the section on Boolean Indexing describes its use.

    Another common operation is the use of boolean vectors to filter the data. The operators are: | for or, & for and, and ~ for not. These must be grouped by using parentheses.

    Bold emphasis mine.