Search code examples
pythonpandasitertools-groupby

Can itertools.groupby use pd.NA?


I tried using itertools.groupby with a pandas Series. But I got:

TypeError: boolean value of NA is ambiguous

Indeed some of my values are NA.

This is a minimal reproducible example:

import pandas as pd
import itertools

g = itertools.groupby([pd.NA,0])
next(g)
next(g)

Comparing a NA always results in NA, so g.__next__ does while NA and fails.

Is there a way to solve this, so itertools.groupby works with NA values? Or should I just accept it and use a different route to my (whatever) goal?


Solution

  • How about using a key function in itertools.groupby to convert pd.NA to None? Since == doesn't produce the desired output with pd.NA, we can use the is operator to perform identity comparison instead.

    import pandas as pd
    import itertools
    
    arr = [pd.NA, pd.NA, 0, 1, 1]
    keyfunc = lambda x: None if (x is pd.NA) else x
    for key, group in itertools.groupby(arr, key=keyfunc):
        print(key, list(group))
    

    Output:

    None [<NA>, <NA>]
    0 [0]
    1 [1, 1]