Search code examples
pythonpandasvectorizationapply

Vectorizing an apply function in pandas


I have a dataframe grouped by issue_ids where i want to apply a custom function. The grouped dataframe looks as follows

import pandas as pd
import numpy as np
sub_test=pd.DataFrame(columns=['issue_id','step','conversion_rate'],data=[['01-abc-234',0,0.45],['01-abc-234',1,0.35],['01-abc-234',2,0.15],['01-abc-234',3,1],['02-abc-234',0,0.05],['02-abc-234',1,0.15],['02-abc-234',2,0.65],['02-abc-234',3,1]])
sub_test.info()

I want to group by issue id and apply the following function for each grouped dataframe

def calculate_conversion_step(row, df):
  if row == 0:
      return np.prod(df.loc[df['step'].isin([1, 2]), 'conversion_rate'])
  elif row == 1:
      return np.prod(df.loc[df['step'] == 2, 'conversion_rate'])
  else:
      return 1

Basically, what i am doing here is iterating through each dataframe for individual issue ids and applying the aforementioned function to each row of the filtered dataframe. I used .apply() but my dataframe is too large to function well with apply.

final=pd.DataFrame()
for issue_id in sub_test['issue_id'].unique():
    int_df = sub_test[sub_test['issue_id'] == issue_id]
    # Apply the 'calculate_conversion_step' function to calculate 'conversion_step' for each issue
    int_df['conversion_step'] = int_df['step'].apply(lambda x: calculate_conversion_step(x, int_df))
    
    # Concatenate the results for each issue
    final = pd.concat([final, int_df])

Is there anyway i can make it faster?

this is my expected outputenter image description here


Solution

  • import numpy as np
    cond0, cond1, cond2 = sub_test['step'].eq(0), sub_test['step'].eq(1), sub_test['step'].eq(2)
    s1 = sub_test.groupby('issue_id')['conversion_rate'].transform(lambda x: x.where(cond1 | cond2).prod())
    s2 = sub_test.groupby('issue_id')['conversion_rate'].transform(lambda x: x.where(cond2).sum())
    sub_test['conversion_step'] = np.select([cond0, cond1], [s1, s2], 1)
    

    output:

    issue_id      step  conversion_rate conversion_step
    0   01-abc-234  0   0.45            0.0525
    1   01-abc-234  1   0.35            0.1500
    2   01-abc-234  2   0.15            1.0000
    3   01-abc-234  3   1.00            1.0000
    4   02-abc-234  0   0.05            0.0975
    5   02-abc-234  1   0.15            0.6500
    6   02-abc-234  2   0.65            1.0000
    7   02-abc-234  3   1.00            1.0000