def fun(output_data):
dic_ = dict.fromkeys(output_data.columns, "first")
dic_.pop("col1")
dic_.pop('col2')
dic_.update({
'col9': "sum",
'col10': "sum",
'col11': "sum",
'col12': "sum",
})
tmp = output_data[output_data['col100'].eq('B2C')].groupby(
['col1', 'col2'], sort=False, as_index=False).agg(dic_)[list(output_data.columns)].reset_index(
drop=True)
output_data = pd.concat(
[tmp,
output_data[output_data['col100'].ne('B2C')]])
I have a data frame where I have to filter then group by and then aggregate on certain columns. But after concatenating I want to change the data frame coming as argument in the function. I tried to do this way but not getting the desired result.
There is no option of inplace=True
in pd.concat()
Example:
Input DataFrame
col1 col2 col3 col4 col5 ...... col100
fixval fixval 12 'a' 'b' ...... B2C
fixval fixval 12 'a' 'c' ...... B2C
fixval fixval 12 'a' 'b' ...... B2C
fixval fixval 12 'a' 'b' ...... B2C
fixval fixval 12 'b' 'a' ...... B2B
fixval fixval 12 'b' 'a' ...... B2B
Output dataFrame
col1 col2 col3 col4 col5 ...... col100
fixval fixval 36 'a' 'b' ...... B2C
fixval fixval 12 'a' 'c' ...... B2C
fixval fixval 12 'b' 'a' ...... B2B
fixval fixval 12 'b' 'a' ...... B2B
Grouping done on col4 and col5 and filtering done on col100 where value = B2C.
Then I need to assign it back to original dataframe which is coming as argument to the function.
One of the solution that I could find till now.
It may not be the best solution but fulfills the need to update data frame inside function.
output_data.drop(output_data.loc[output_data['col100'].eq('B2C')].index, inplace=True)
for idx, row in tmp.iterrows():
output_data.loc[len(output_data)] = row