I'm trying to replicate an excel type of "index/match" in python. I have two dfs.
df1:
RIC | Sector Code | SP500 | SP400 | SP600 |
---|---|---|---|---|
FI | 40 | SP500 | ||
BRBR | 30 | SP400 | ||
XPEL | 25 | SP600 |
df2: (contains lookup table)
Code | Name | SP400 | SP500 | SP600 |
---|---|---|---|---|
25 | Consumer Discretionary | .SPMDCD | .SPSMCD | .SPLRCD |
30 | Consumer Staples | .SPMDCS | .SPSMCS | .SPLRCS |
10 | Energy | .SPMDCE | .SPSMCE | .SPNY |
40 | Financials | .SPMDCF | .SPSMCF | .SPSY |
35 | Health Care | .SPMDCA | .SPSMCA | .SPXHC |
20 | Industrials | .SPMDCI | .SPSMCI | .SPLRCI |
45 | Information Technology | .SPMDCT | .SPSMCT | .SPLRCT |
15 | Materials | .SPMDCM | .SPSMCM | .SPLRCM |
50 | Telecommunication Services | .SPMDCL | .SPSMCL | .SPLRCL |
55 | Utilities | .SPMDCU | .SPSMCU | .SPLRCU |
Desired Result: (Adds the result to df1)
RIC | Sector Code | SP500 | SP400 | SP600 | RESULT SP500 | RESULT SP400 | RESULT SP600 |
---|---|---|---|---|---|---|---|
FI | 40 | SP500 | .SPSMCF | ||||
BRBR | 30 | SP400 | .SPMDCS | ||||
XPEL | 25 | SP600 | .SPLRCD |
I tried
merged_df = df1.merge(df2, left_on=['Sector Code'], right_on=['Code'], how='left').
This combined df1 and df2, but it give me every match (SP500, SP400, and SP600) instead of just the one match for each.
This Code is Most Efficient for huge Datasets.
import pandas as pd
df1 = pd.DataFrame([
{'RIC': 'FI', 'Sector Code': 40, 'SP500': 'SP500', 'SP400': '', 'SP600': ''},
{'RIC': 'BRBR', 'Sector Code': 30, 'SP400': 'SP400', 'SP500': '', 'SP600': ''},
{'RIC': 'XPEL', 'Sector Code': 25, 'SP400': '', 'SP500': '', 'SP600': 'SP600'}])
df2 = pd.DataFrame({
'Code': [25, 30, 10, 40, 35, 20, 45, 15, 50, 55],
'Name': ['Consumer Discretionary', 'Consumer Staples', 'Energy', 'Financials',
'Health Care', 'Industrials', 'Information Technology', 'Materials',
'Telecommunication Services', 'Utilities'],
'SP400': ['.SPMDCD', '.SPMDCS', '.SPMDCE', '.SPMDCF', '.SPMDCA', '.SPMDCI',
'.SPMDCT', '.SPMDCM', '.SPMDCL', '.SPMDCU'],
'SP500': ['.SPSMCD', '.SPSMCS', '.SPSMCE', '.SPSMCF', '.SPSMCA', '.SPSMCI',
'.SPSMCT', '.SPSMCM', '.SPSMCL', '.SPSMCU'],
'SP600': ['.SPLRCD', '.SPLRCS', '.SPNY', '.SPSY', '.SPXHC', '.SPLRCI',
'.SPLRCT', '.SPLRCM', '.SPLRCL', '.SPLRCU']})
"""
print(df1)
RIC Sector Code SP500 SP400 SP600
0 FI 40 SP500
1 BRBR 30 SP400
2 XPEL 25 SP600
print(df2)
Code Name SP400 SP500 SP600
0 25 Consumer Discretionary .SPMDCD .SPSMCD .SPLRCD
1 30 Consumer Staples .SPMDCS .SPSMCS .SPLRCS
2 10 Energy .SPMDCE .SPSMCE .SPNY
3 40 Financials .SPMDCF .SPSMCF .SPSY
4 35 Health Care .SPMDCA .SPSMCA .SPXHC
5 20 Industrials .SPMDCI .SPSMCI .SPLRCI
6 45 Information Technology .SPMDCT .SPSMCT .SPLRCT
7 15 Materials .SPMDCM .SPSMCM .SPLRCM
8 50 Telecommunication Services .SPMDCL .SPSMCL .SPLRCL
9 55 Utilities .SPMDCU .SPSMCU .SPLRCU
"""
sp_cols = [col for col in df2.filter(like='SP')]
res_dict ={}
for col in sp_cols :
res_dict[f'RESULT_{col}'] = df1['Sector Code'].apply(
lambda d : df2.set_index('Code').loc[d,col] if d in df2['Code'].tolist() else np.nan
)
#print(res_dict[f'RESULT_{col}'])
df1_join_res_dict = df1.join(pd.DataFrame(res_dict))
print(df1_join_res_dict.to_string())
"""
RIC Sector Code SP500 SP400 SP600 RESULT_SP400 RESULT_SP500 RESULT_SP600
0 FI 40 SP500 .SPMDCF .SPSMCF .SPSY
1 BRBR 30 SP400 .SPMDCS .SPSMCS .SPLRCS
2 XPEL 25 SP600 .SPMDCD .SPSMCD .SPLRCD
"""