I have the following DataFrame:
df =
sample_type data_window
A 1
A 1
A 1
B 1
B 1
B 8
B 1
C 2
C 1
C 1
I'm trying to count the maximum consecutive identical values =1 in data_window
for each sample_type
.
I'd like an output DataFrame like below:
df =
sample_type cum_count
A 3
B 2
C 2
I've tried an approach similar to this: groupby consecutive identical values in pandas dataframe and cumulative count of the number of occurences, using groupby
and cumsum
, but can't get it to work.
Any help would be greatly appreciated!
You can use groupby_agg
with a custom function:
max_consecutive_zeroes = lambda x: x.loc[x.eq(1)].groupby(x.ne(1).cumsum()).size().max()
out = df.groupby('sample_type', as_index=False)['data_window'].agg(max_consecutive_zeroes)
Output:
>>> out
sample_type data_window
0 A 3
1 B 2
2 C 2
Example.
Suppose the following series (only one group, so don't need the initial groupby_agg
here):
>>> x
0 1 # First group (2)
1 1 #
2 6
3 1 # Second group (1)
4 7
5 9
6 1 # Third group (3, the longest)
7 1 #
8 1 #
9 3
dtype: int64
Step by step:
# Remove other rows where values are not 1
>>> out = x.loc[x.eq(1)]
0 1
1 1
3 1
6 1
7 1
8 1
dtype: int64
# Create virtual groups based on values are not 1
>>> out = out.groupby(x.ne(1).cumsum())
[(0, # First group
0 1
1 1
dtype: int64),
(1, # Second group
3 1
dtype: int64),
(3, # Third group
6 1
7 1
8 1
dtype: int64)]
# Get the size and keep the max
>>> out.size().max()
3