Actually I am new in Python and just try to do the algorithm below but I got an error.(ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().) I wanna count the time if the values of 'A' are equal.
import numpy as np
import pandas as pd
n=10000000
count=0
df=pd.DataFrame(np.random.randint(0,2,size=(n,1)),columns=['A'])
check =df['A'].diff().eq(0).astype(int)
if check==0:
count = count+1
df['B']=count
print(df)
Try creating a boolean index based on where the current value are not equal to the next value.
Sample df
:
df = pd.DataFrame({'A': [1, 1, 0, 0, 1, 1, 1]})
A
0 1
1 1
2 0
3 0
4 1
5 1
6 1
df['A'].ne(df['A'].shift()).cumsum()
0 1
1 1
2 2
3 2
4 3
5 3
6 3
Name: A, dtype: int32
Use the index in groupby cumcount
to enumerate each group:
df['B'] = df.groupby(df['A'].ne(df['A'].shift()).cumsum()).cumcount()
A B
0 1 0
1 1 1
2 0 0
3 0 1
4 1 0
5 1 1
6 1 2
Complete Code:
import pandas as pd
df = pd.DataFrame({'A': [1, 1, 0, 0, 1, 1, 1]})
df['B'] = df.groupby(df['A'].ne(df['A'].shift()).cumsum()).cumcount()