I have the below input from excel file (Sheet1 and Sheet2)
Sheet1:
Order ID | Order Date | Segment | Sales
1001 11-11-2016 Consumer 100
1001 11-11-2016 Consumer 200
2001 16-06-2016 Consumer 300
Sheet2:
Returned | Order ID
Yes 1001
And i'm using the below code in python where I'm using the inner join and groupby to get only the matching records from both the sheets
import pandas as pd
Sheet1 = pd.read_excel (r"C:\Users\Bharath Shana\Desktop\Python\sample data.xlsx", sheet_name='Sheet1')
Sheet2 = pd.read_excel (r"C:\Users\Bharath Shana\Desktop\Python\sample data.xlsx", sheet_name='Sheet2')
Order_Year = pd.DatetimeIndex(Sheet1['Order Date']).year
Sheet1.merge(Sheet2, on='Order ID', how='inner')
Sheet1.groupby(['Order ID',Order_Year, 'Segment'])['Sales'].sum()
Output:
As you can see in the above output instead of showing the matching records it is showing all the records and i want the output like below.
Required Output:
Can someone please help me how to modify my above python code to get the Required Output.
Regards,
Vikas
Let's try this,
print(
sheet1[sheet1['Order ID'].isin(sheet2['Order ID'])]
.assign(Year=pd.to_datetime(sheet1['Order Date']).dt.year)
.groupby(['Order ID', 'Segment', 'Year'])['Sales'].sum()
.reset_index(name="Sales_Sum")
)
Order ID Segment Year Sales_Sum
0 1001 Consumer 2016 300