pandas group by with reset index

Given I have following pandas dataframe

UserID	LoginDate	PlacedOrderItems	Other column
A123456	2023-04-01	1	a
A123456	2023-04-10	0	b
B123456	2023-04-05	0	c
B123456	2023-04-06	2	d
B123456	2023-04-07	0	e

I want to use group by UserID and get min of LoginDate and max of PlacedOrderItems added to each row of my datafarme. I tried Group by but turns the column names into a weird multi level indexing.

But What I want is something like below where I preserve the original dataframe and just get two new columns added.

UserID	LoginDate	PlacedOrderItems	Other column	min_login_date	max_order_items
A123456	2023-04-01	1	a	2023-04-01	1
A123456	2023-04-10	0	b	2023-04-01	1
B123456	2023-04-05	0	c	2023-04-05	2
B123456	2023-04-06	2	d	2023-04-05	2
B123456	2023-04-07	1	e	2023-04-05	2

Solution

You can achieve this by using the groupby() method with the transform() method to create two new columns: min_login_date and max_order_items:

import pandas as pd


# set display options
pd.set_option('display.max_columns', 10)
pd.set_option('display.width', 100)

# create the dataframe
data = {'UserID': ['A123456', 'A123456', 'B123456', 'B123456', 'B123456'],
        'LoginDate': ['2023-04-01', '2023-04-10', '2023-04-05', '2023-04-06', '2023-04-07'],
        'PlacedOrderItems': [1, 0, 0, 2, 0],
        'OtherColumn': ['a', 'b', 'c', 'd', 'e']}
df = pd.DataFrame(data)

# convert LoginDate to datetime format
df['LoginDate'] = pd.to_datetime(df['LoginDate'])

# group by UserID and transform to get the min LoginDate and max PlacedOrderItems
df['min_login_date'] = df.groupby('UserID')['LoginDate'].transform('min')
df['max_order_items'] = df.groupby('UserID')['PlacedOrderItems'].transform('max')

# group by all columns and take the first row of each group
group_cols = ['UserID', 'LoginDate', 'PlacedOrderItems', 'OtherColumn', 'min_login_date', 'max_order_items']
df = df.groupby(group_cols).first().reset_index()

# print the resulting dataframe
print(df.to_string(index=False))