Search code examples
pythonpandaspandas-groupbypandas-apply

Faster Way to GroupBy Apply Python Pandas?


How can I make the Groupby Apply run faster, or how can I write it a different way?

import numpy as np
import pandas as pd

df = pd.DataFrame({'ID':[1,1,1,1,1,2,2,2,2,2],\
                   'value':[1,2,np.nan,3,np.nan,1,2,np.nan,4,np.nan]})

result = df.groupby("ID").apply(lambda x: len(x[x['value'].notnull()].index)\
                    if((len(x[x['value']==1].index)>=1)&\
                    (len(x[x['value']==4].index)==0)) else 0)

output:

Index  0  
1      3  
2      0

My program runs very slow right now. Can I make it faster? I have in the past filtered before using groupby() but I don't see an easy way to do it in this situation.


Solution

  • Not sure if this is what you need. I have decomposed it a bit, but you can easily method-chain it to get the code more compact:

    df = pd.DataFrame(
        {
            "ID": [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
            "value": [1, 2, np.nan, 3, np.nan, 1, 2, np.nan, 4, np.nan],
        }
    )
    
    df["x1"] = df["value"] == 1
    df["x2"] = df["value"] == 4
    
    df2 = df.groupby("ID").agg(
        y1=pd.NamedAgg(column="x1", aggfunc="max"),
        y2=pd.NamedAgg(column="x2", aggfunc="max"),
        cnt=pd.NamedAgg(column="value", aggfunc="count"),
    )
    
    df3 = df2.assign(z=lambda x: (x['y1'] & ~x['y2'])*x['cnt'])
    
    result = df3.drop(columns=['y1', 'y2', 'cnt'])
    print(result)
    

    which will yield

        z
    ID   
    1   3
    2   0