When I want to apply the same function to multiple columns, I have to write the name of the columns and map them to the same function one by one. This can become tedious when the number of columns are a big number. In the code below I map 3 column to the same function("first").
user_id = [12, 12, 13, 13, 13]
category = ["furniture", "furniture", "electronics","electronics","electronics"]
name = ["Casey", "Casey", "Alice", "Alice", "Alice"]
payment_amount = [96, 109, 56, 0, 90]
example_df = pd.DataFrame({"user_id" : user_id, "category" : category, "name" : name, "payment_amount": payment_amount})
expected_output = example_df.groupby("user_id").agg({"user_id" : "first", "category" : "first", "name" : "first", "payment_amount": sum})
Instead, I want to do something like this and get the same output:
expected_output = example_df.groupby("user_id").agg({["user_id" , "category" , "name"]: "first", "payment_amount": sum})
But this throws an error. How can this be done?
You can generate dict
:
d = {**{"payment_amount": 'sum'},
**dict.fromkeys(["user_id" , "category" , "name"], 'first')}
print (d)
{'payment_amount': 'sum', 'user_id': 'first', 'category': 'first', 'name': 'first'}
expected_output = example_df.groupby("user_id").agg(d)
More general solution should be:
d = dict.fromkeys(example_df.columns, 'first')
d['payment_amount'] = 'sum'
print (d)
{'user_id': 'first', 'category': 'first', 'name': 'first', 'payment_amount': 'sum'}
expected_output = example_df.groupby("user_id").agg(d)