Search code examples
pythonpandasdataframepandas-groupbyfrequency

Counting occurrence of values after using groupby on multiple pandas columns


My input dataframe is as follows :

Input data

generated using the following lines of code

 l = [["a", 12, 12], ["a", 12, 33.], ["b", 12.3, 12.3], ["a", 13, 1]]
df = pd.DataFrame(l, columns=["a", "b", "c"])

I am currently able to cumulatively count the frequency as follows

Output

using

df['freq'] = df.groupby(by=["a","b"]).cumcount()+1

which takes into account common values in column a and column b and counts them. However I would like to add to the freq count only when column b values are different while column a values are the same. The picture below shows a form of desired output :

Desired output

How do I achieve this in pandas in an optimised manner?


Solution

  • You can use double .groupby. For example:

    df["freq"] = df.groupby("a")["b"].apply(lambda x: x.groupby(x).ngroup() + 1)
    print(df)
    

    Prints:

       a     b     c  freq
    0  a  12.0  12.0     1
    1  a  12.0  33.0     1
    2  b  12.3  12.3     1
    3  a  13.0   1.0     2