Search code examples
pythonpython-3.xpandastruthiness

Python 3 lambda error: The truth value of a Series is ambiguous


I am getting this error: The truth value of a Series is ambiguous in my lambda function. I know that here is a very comprehensive explanation around this error but I don't think this relates to my issue: Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

Basically I am trying to determine via lambda whether OpenBal is the same from one month to the next within the same AccountID and give me a '1' if it is the same (e.g. for OpenBal=101 below). Obviously the first record should give me a NaN. (P.S. thanks @jdehesa for your answers in my other post).

This demonstrates my problem:

import pandas as pd
df = pd.DataFrame({'AccountID': [1,1,1,1,2,2,2,2,2],
                   'RefMonth':    [1,2,3,4,1,2,3,4,5],
                   'OpenBal':    [100,101,101,103,200,201,202,203,204]})
SameBal = df.groupby('AccountID').apply(lambda g: 1 if g['OpenBal'].diff() == 0 else 0)
df['SameBal'] = SameBal.sortlevel(1).values

Solution

  • Your error correctly indicates you can't check the truthness of a series. But custom anonymous functions are not necessary for this task.

    Using groupby + transform with pd.Series.diff:

    import pandas as pd
    
    df = pd.DataFrame({'AccountID': [1,1,1,1,2,2,2,2,2],
                       'RefMonth':    [1,2,3,4,1,2,3,4,5],
                       'OpenBal':    [100,101,101,103,200,201,202,203,204]})
    
    df['A'] = (df.groupby('AccountID')['OpenBal'].transform(pd.Series.diff)==0).astype(int)
    
    print(df)
    
       AccountID  OpenBal  RefMonth   A
    0          1      100         1   0
    1          1      101         2   0
    2          1      101         3   1
    3          1      103         4   0
    4          2      200         1   0
    5          2      201         2   0
    6          2      202         3   0
    7          2      203         4   0
    8          2      204         5   0
    

    If you need NaN for the first row of each group:

    g = df.groupby('AccountID')['OpenBal'].transform(pd.Series.diff)
    df['A'] = (g == 0).astype(int)
    df.loc[g.isnull(), 'A'] = np.nan
    
    print(df)
    
       AccountID  OpenBal  RefMonth    A
    0          1      100         1  NaN
    1          1      101         2  0.0
    2          1      101         3  1.0
    3          1      103         4  0.0
    4          2      200         1  NaN
    5          2      201         2  0.0
    6          2      202         3  0.0
    7          2      203         4  0.0
    8          2      204         5  0.0