I am facing some difficulties using merge function in Pandas. I am looking for some kind of Vlookup formula for this. However, I couldn't solve my problem.
My data is huge and I couldn't share here due to confidentiality. However, I try to came up with similar data here.
Old Code | New Code | Name | Invoice Date |
---|---|---|---|
1001011 | NA | Cheese Cake | 02/02/2021 |
1001012 | NA | Coffee | 03/05/2021 |
1001011 | NA | Cheese Cake | 30/05/2021 |
NA | 2002093 | Jasmine Tea | 21/08/2021 |
NA | 2002042 | Cookies | 31/12/2021 |
NA | 2002080 | Coffee | 09/01/2022 |
NA | 2002093 | Jasmine Tea | 05/05/2022 |
NA | 2002058 | Cheese Cake | 07/06/2022 |
I would like to have a COST Column input in my table above. However, the cost is very by invoice date (Also take note on the changing of product code). We have 2 cost table. For year 2021:
Old Code | New Code | Name | Jan-21 | Feb-21 | Mar-21 | Apr-21 | May-21 | June-21 | Jul-21 | Aug-21 | Sep-21 | Oct-21 | Nov-21 | Dec-21 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1001011 | 2002058 | Cheese Cake | 50 | 51 | 50 | 53 | 54 | 52 | 55 | 53 | 50 | 52 | 53 | 53 |
1001012 | 2002080 | Coffee | 5 | 6 | 5 | 6 | 6 | 5 | 7 | 5 | 6 | 5 | 6 | 6 |
1001015 | 2002093 | Jasmine Tea | 4 | 3 | 3 | 4 | 4 | 3 | 5 | 3 | 3 | 3 | 3 | 4 |
1001020 | 2002042 | Cookies | 20 | 20 | 21 | 20 | 22 | 20 | 21 | 20 | 22 | 20 | 21 | 22 |
And also for Year 2022:
Old Code | New Code | Name | Jan-22 | Feb-22 | Mar-22 | Apr-22 | May-22 | June-22 | Jul-22 | Aug-22 | Sep-22 | Oct-22 | Nov-22 | Dec-22 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1001011 | 2002058 | Cheese Cake | 52 | 52 | 55 | 55 | 56 | 52 | NA | NA | NA | NA | NA | NA |
1001012 | 2002080 | Coffee | 5 | 6 | 5 | 6 | 6 | 6.5 | NA | NA | NA | NA | NA | NA |
1001015 | 2002093 | Jasmine Tea | 4 | 3 | 3 | 5 | 5 | 5.5 | NA | NA | NA | NA | NA | NA |
1001020 | 2002042 | Cookies | 22 | 22 | 23 | 23 | 23.5 | 23 | NA | NA | NA | NA | NA | NA |
So basically, I would like to have my cost column in my first Data Frame to reflect the correct costing for different Year and different Month.
Example:
Invoice Date Costing for 03/05/2021 = May_2021
You need to have the month and code number on both sides when merging, so:
import pandas as pd
import io
import datetime
invoice_data_text = '''Old Code New Code Name Invoice Date
1001011 NA Cheese Cake 02/02/2021
1001012 NA Coffee 03/05/2021
1001011 NA Cheese Cake 30/05/2021
NA 2002093 Jasmine Tea 21/08/2021
NA 2002042 Cookies 31/12/2021
NA 2002080 Coffee 09/01/2022
NA 2002093 Jasmine Tea 05/05/2022
NA 2002058 Cheese Cake 07/06/2022
'''
cost_2021_text = '''
Old Code New Code Name Jan-21 Feb-21 Mar-21 Apr-21 May-21 June-21 Jul-21 Aug-21 Sep-21 Oct-21 Nov-21 Dec-21
1001011 2002058 Cheese Cake 50 51 50 53 54 52 55 53 50 52 53 53
1001012 2002080 Coffee 5 6 5 6 6 5 7 5 6 5 6 6
1001015 2002093 Jasmine Tea 4 3 3 4 4 3 5 3 3 3 3 4
1001020 2002042 Cookies 20 20 21 20 22 20 21 20 22 20 21 22
'''
cost_2022_text = '''
Old Code New Code Name Jan-22 Feb-22 Mar-22 Apr-22 May-22 June-22 Jul-22 Aug-22 Sep-22 Oct-22 Nov-22 Dec-22
1001011 2002058 Cheese Cake 52 52 55 55 56 52 NA NA NA NA NA NA
1001012 2002080 Coffee 5 6 5 6 6 6.5 NA NA NA NA NA NA
1001015 2002093 Jasmine Tea 4 3 3 5 5 5.5 NA NA NA NA NA NA
1001020 2002042 Cookies 22 22 23 23 23.5 23 NA NA NA NA NA NA
'''
# Prepare
invoice_df = pd.read_csv(io.StringIO(invoice_data_text),sep="\t",parse_dates=["Invoice Date"])
cost21 = pd.read_csv(io.StringIO(cost_2021_text),sep='\t')
cost22 = pd.read_csv(io.StringIO(cost_2022_text),sep='\t')
# Create Month column for merging
invoice_df["Month"] = invoice_df["Invoice Date"].map(lambda x:datetime.datetime.strftime(x,"%b-%y"))
# Combine two cost tables
cost21_stack = cost21.set_index(list(cost21.columns[:3])).stack().reset_index(name="Cost")
cost22_stack = cost22.set_index(list(cost22.columns[:3])).stack().reset_index(name="Cost")
cost_table = pd.concat([cost21_stack,cost22_stack]).rename({"level_3":"Month"},axis=1)
# Merge with new code and old code respectively
old_code_result = pd.merge(invoice_df[pd.isna(invoice_df["Old Code"]) == False], cost_table[["Old Code","Month","Cost"]], on=["Old Code","Month"] ,how="left")
new_code_result = pd.merge(invoice_df[pd.isna(invoice_df["New Code"]) == False], cost_table[["New Code","Month","Cost"]], on=["New Code","Month"] ,how="left")
# Combine result
pd.concat([old_code_result,new_code_result])