I have the following code:
data={'id':[1,2,3,4,5,6,7,8,9,10,11],
'value':[1,0,1,0,1,1,1,0,0,1,0]}
df=pd.DataFrame.from_dict(data)
df
Out[8]:
id value
0 1 1
1 2 0
2 3 1
3 4 0
4 5 1
5 6 1
6 7 1
7 8 0
8 9 0
9 10 1
10 11 0
I want to create a flag column that indicate with 1 consecutive values starting from the second occurrence and ignoring the first.
With the actual solution:
df['flag'] =
df.value.groupby([df.value,df.flag.diff().ne(0).cumsum()]).transform('size').ge(3).astype(int)
Out[8]:
id value flag
0 1 1 0
1 2 0 0
2 3 1 0
3 4 0 0
4 5 1 1
5 6 1 1
6 7 1 1
7 8 0 1
8 9 0 1
9 10 1 0
10 11 0 0
While I need a solution like this, where the first occurence is flagged as 0 and 1 starting from the second:
Out[8]:
id value flag
0 1 1 0
1 2 0 0
2 3 1 0
3 4 0 0
4 5 1 0
5 6 1 1
6 7 1 1
7 8 0 0
8 9 0 1
9 10 1 0
10 11 0 0
Create consecutive groups by compared Series.shift
ed values by not equal and Series.cumsum
, create counter by GroupBy.cumcount
and compare if greater values like 0
by Series.gt
, last map True, False
to 1, 0
by casting to integers by Series.astype
:
df['flag'] = (df.groupby(df['value'].ne(df['value'].shift()).cumsum())
.cumcount()
.gt(0)
.astype(int))
print (df)
id value flag
0 1 1 0
1 2 0 0
2 3 1 0
3 4 0 0
4 5 1 0
5 6 1 1
6 7 1 1
7 8 0 0
8 9 0 1
9 10 1 0
10 11 0 0
How it working:
print (df.assign(g = df['value'].ne(df['value'].shift()).cumsum(),
counter = df.groupby(df['value'].ne(df['value'].shift()).cumsum()).cumcount(),
mask = df.groupby(df['value'].ne(df['value'].shift()).cumsum()).cumcount().gt(0)))
id value g counter mask
0 1 1 1 0 False
1 2 0 2 0 False
2 3 1 3 0 False
3 4 0 4 0 False
4 5 1 5 0 False
5 6 1 5 1 True
6 7 1 5 2 True
7 8 0 6 0 False
8 9 0 6 1 True
9 10 1 7 0 False
10 11 0 8 0 False