Search code examples
pythonpandasnumpydiffboolean-operations

Can we use diff() for condition in Python?


Actually I am new in Python and just try to do the algorithm below but I got an error.(ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().) I wanna count the time if the values of 'A' are equal.

import numpy as np
import pandas as pd
n=10000000
count=0
df=pd.DataFrame(np.random.randint(0,2,size=(n,1)),columns=['A'])
check =df['A'].diff().eq(0).astype(int)
if check==0:
   count = count+1
   df['B']=count
print(df)

Solution

  • Try creating a boolean index based on where the current value are not equal to the next value.

    Sample df:

    df = pd.DataFrame({'A': [1, 1, 0, 0, 1, 1, 1]})
    
       A
    0  1
    1  1
    2  0
    3  0
    4  1
    5  1
    6  1
    

    Index with shift + cumsum:

    df['A'].ne(df['A'].shift()).cumsum()
    
    0    1
    1    1
    2    2
    3    2
    4    3
    5    3
    6    3
    Name: A, dtype: int32
    

    Use the index in groupby cumcount to enumerate each group:

    df['B'] = df.groupby(df['A'].ne(df['A'].shift()).cumsum()).cumcount()
    
       A  B
    0  1  0
    1  1  1
    2  0  0
    3  0  1
    4  1  0
    5  1  1
    6  1  2
    

    Complete Code:

    import pandas as pd
    
    df = pd.DataFrame({'A': [1, 1, 0, 0, 1, 1, 1]})
    df['B'] = df.groupby(df['A'].ne(df['A'].shift()).cumsum()).cumcount()