Search code examples
pythonpandaspandas-groupbypython-3.7

How to use Inner Join and groupby in the same python code


I have the below input from excel file (Sheet1 and Sheet2)

Sheet1:

Order ID | Order Date | Segment | Sales

1001 11-11-2016 Consumer 100

1001 11-11-2016 Consumer 200

2001 16-06-2016 Consumer 300

Sheet2:

Returned | Order ID

Yes 1001

And i'm using the below code in python where I'm using the inner join and groupby to get only the matching records from both the sheets

import pandas as pd

Sheet1 = pd.read_excel (r"C:\Users\Bharath Shana\Desktop\Python\sample data.xlsx", sheet_name='Sheet1')

Sheet2 = pd.read_excel (r"C:\Users\Bharath Shana\Desktop\Python\sample data.xlsx", sheet_name='Sheet2')

Order_Year = pd.DatetimeIndex(Sheet1['Order Date']).year

Sheet1.merge(Sheet2, on='Order ID', how='inner')

Sheet1.groupby(['Order ID',Order_Year, 'Segment'])['Sales'].sum()

Output:

enter image description here

As you can see in the above output instead of showing the matching records it is showing all the records and i want the output like below.

Required Output:

enter image description here

Can someone please help me how to modify my above python code to get the Required Output.

Regards,

Vikas


Solution

  • Let's try this,

    print(
        sheet1[sheet1['Order ID'].isin(sheet2['Order ID'])]
            .assign(Year=pd.to_datetime(sheet1['Order Date']).dt.year)
            .groupby(['Order ID', 'Segment', 'Year'])['Sales'].sum()
            .reset_index(name="Sales_Sum")
    )
    

       Order ID   Segment  Year  Sales_Sum
    0      1001  Consumer  2016        300