Search code examples
pythonpandasdataframegroup-by

Pandas Transforming the Applied Results back to the original dataframe


Consider the following DataFrame:

candy = pd.DataFrame({'Name':['Bob','Bob','Bob','Annie','Annie','Annie','Daniel','Daniel','Daniel'], 'Candy': ['Chocolate', 'Chocolate', 'Lollies','Chocolate', 'Chocolate', 'Lollies','Chocolate', 'Chocolate', 'Lollies'], 'Value':[15,15,10,25,30,12,40,40,16]})

After reading this post, I am aware that apply() works on the whole Dataframe and transform() works on one series at-a-time.

So if I want to append the total $ spend on candy per person, I can simply use the following:

candy['Total Spend'] = candy.groupby(['Name'])['Value'].transform(sum)

But if I need to append the total $ chocolate spend per person, it feels like I have no choice but to create a separate dataframe and then merging it back by using the apply() function since transform() only works on a series.

chocolate = candy.groupby(['Name']).apply(lambda x: x[x['Candy'] == 'Chocolate']['Value'].sum()).reset_index(name = 'Total_Chocolate_Spend')
candy = pd.merge(candy, chocolate, how = 'left',left_on=['Name'], right_on=['Name'])

While I don't mind writing the above code to solve this problem. Is it possible to transform() the .apply()'d results back to the dataframe without having to create a separate dataframe and merge it?

What is actually happening when the transform() function is used? Is a separate series being stored in memory and then merged back by the indexes similar to what I have done in the apply then merged method?


Solution

  • I do not have much to add to the excellent reference you provided on apply vs. transform, but you can do what you want without creating a separate dataframe, for example you can do

    candy.groupby(['Name']).apply(lambda x: x.assign(Total_Chocolate_Spend = x[x['Candy'] == 'Chocolate']['Value'].sum()))
    

    this uses assign for each group in groupby to populate Total_Chocolate_Spend with the number you want