Search code examples
pythonpandasdataframecount

Python Pandas: Counting the amount of subsequent value and assign a name if conditions are met


For example I have created this data frame:

import pandas as pd

df = pd.DataFrame({'Cycle': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
                             2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4,
                             4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5]})


#Maybe something like this: df['Cycle Type'] = df['Cycle'].rolling(2).apply(lambda x: len(set(x)) != len(x),raw= True).replace({0 : False, 1: True})

I want to count the amount of values and than assign a type of cycle to it. If the cycle has less than 12 rows or more than 100 rows mark it as bad, else mark it as good. I was thinking of using something like that lambda function to check if the value from the row before was the same, but I'm not sure how to add the count feature to give it the parameters I want.


Solution

  • Start by counting the number of rows in each group with pandas.DataFrame.groupby, pandas.DataFrame.transform, and pandas.DataFrame.count as

    df["cycle_quality"] = df.groupby("Cycle")["Cycle"].transform("count")
    

    Then apply the quality function to it using pandas.DataFrame.apply:

    • If number of rows is less than 12 and more than 100, define cycle_quality as bad

    • Else, cycle_quality should be good

    df["cycle_quality"] = df.apply(lambda x: "bad" if x["cycle_quality"] < 12 or x["cycle_quality"] > 100 else "good", axis=1)
    
    [Out]:
        Cycle cycle_quality
    0       0          good
    1       0          good
    2       0          good
    3       0          good
    4       0          good
    ..    ...           ...
    71      5           bad
    72      5           bad
    73      5           bad
    74      5           bad
    75      5           bad