I have a dataframe like as below
data_df = pd.DataFrame({'p_id': ['abc@gmail.com','abc@gmail.com','abc@gmail.com','ace@gmail.com','ace@gmail.com','pqr@gmail.com','pqr@gmail.com'],
'company': ['a','b','c','d','e','f','g'],
'dept_access':['a1','a1','a1','a1','a2','a2','a2']})
key_df = pd.DataFrame({'p_id': ['abc@gmail.com','xyz@gmail.com','pqr@gmail.com'],
'company': ['a','c','b'],
'location':['UK','USA','KOREA']})
I would like to do the below
a) Attach the location
column from key_df
to data_df
based on two fields - p_id
and company
So, I tried the below
loc = key_df.drop_duplicates(['p_id','company']).set_index(['p_id','company'])['location']
data_df['location'] = data_df[['p_id','company']].map(loc)
But this resulted in error like below
KeyError: "None of [Index(['p_id','company'], dtype='object')] are in the [columns]"
How can I map based on multiple index columns? I don't wish to use merge
Merge can be used for a lot, so let's first try to use it:
data_df.merge(key_df, on=['p_id', 'company'], how="left")
p_id company dept_access location
0 abc@gmail.com a a1 UK
1 abc@gmail.com b a1 NaN
2 abc@gmail.com c a1 NaN
3 ace@gmail.com d a1 NaN
4 ace@gmail.com e a2 NaN
5 pqr@gmail.com f a2 NaN
6 pqr@gmail.com g a2 NaN
You can also do this by mapping the index like this:
idx = ['p_id', 'company']
data_df.assign(location=data_df.set_index(idx).index.map(key_df.set_index(idx)['location']))
p_id company dept_access location
0 abc@gmail.com a a1 UK
1 abc@gmail.com b a1 NaN
2 abc@gmail.com c a1 NaN
3 ace@gmail.com d a1 NaN
4 ace@gmail.com e a2 NaN
5 pqr@gmail.com f a2 NaN
6 pqr@gmail.com g a2 NaN