I have a data set that I have broken into segments based on a certain criteria. In another column I have it where the code returns 'True' if the value is equal to the one before it and 'False' if it is not equal. I am able to get the total count of 'False' values for the entire data set, but I am trying to get the total count of 'False' values per segment.
My code:
df['cols2'] = df['cols1'].diff().eq(0).replace({False : 0, True : 1})
counter_obj = Counter(df['cols2'])
false_count = counter_obj[False]
seg = df.groupby('segID')[cols2 , false_count].sum()
print(seg)
import pandas as pd
df = pd.DataFrame({'cols1': [1, 2, 3, 3, 4, 5, 5, 6, 7]})
df['cols2'] = df['cols1'].diff().eq(0).replace({False: 0, True: 1})
df['cs'] = df['cols2'].cumsum()
I can suggest creating a 'cs'
column with a cumulative sum
of 'cols2'
in order to divide the dataframe into groups
. As far as I understood you, you need to count only zeros in each segment.
Input
cols1 cols2 cs
0 1 0 0
1 2 0 0
2 3 0 0
3 3 1 1
4 4 0 1
5 5 0 1
6 5 1 2
7 6 0 2
8 7 0 2
To aggregate
a count by group:
agr = df.groupby('cs').apply(lambda x: x.loc[x['cols2'] == 0, 'cols2'].count())
Output
cs
0 3
1 2
2 2
and if need data for each row
:
df['count'] = df.groupby('cs')['cols2'].transform(lambda x: x[x==0].count())
Output
cols1 cols2 cs count
0 1 0 0 3
1 2 0 0 3
2 3 0 0 3
3 3 1 1 2
4 4 0 1 2
5 5 0 1 2
6 5 1 2 2
7 6 0 2 2
8 7 0 2 2