I have a dataframe where two columns, User
and Code
, have their values mixed or not correctly filled.
df = pd.DataFrame({"User":['Dan', 'Mary','003','010','Luke','Peter'],"Code":['001','035','003','Martin','Luke','AAA'],"Job":['astronaut','biologist','director','footballer','waiter']})
df
User Code Job
Dan 001 astronaut
Mary 035 biologist
003 003 director
010 Martin footballer
Luke Luke waiter
Peter AAA unkown
Given that I have a second dataframe - dictionary with the mapping of users and codes:
df_dict = pd.DataFrame({"name":['Dan','Paul','Julia','Mary','Martin','George','Daniel','Luke','Marina'],"code":['001','045','012','035','010','003','200','501']})
df_dict
name code
Dan 001
Paul 045
Julia 012
Mary 035
Martin 010
George 003
Daniel 200
Marina 501
how can I do the mapping so that the first dataframe df
looks like:
User Code Job
Dan 001 astronaut
Mary 035 biologist
George 003 director
Martin 010 footballer
Luke 200 waiter
Peter AAA unknown
Notice that if a value is not in the df_dict
(like 'Peter' in this example) the record should remain as in the original df
Create Series
for mapping by first columns in Series.map
with replace possible missing values by map by another columns, reasign columns in DataFrame.assign
and replace missing values by original values by DataFrame.fillna
:
d1 = df_dict.set_index('name')['code']
d2 = df_dict.set_index('code')['name']
c = df['User'].map(d1).fillna(df['Code'].map(d1))
u = df['Code'].map(d2).fillna(df['User'].map(d2))
df = df.assign(Code=c, User=u).fillna(df)
print (df)
User Code Job
0 Dan 001 astronaut
1 Mary 035 biologist
2 George 003 director
3 Martin 010 footballer
4 Luke 200 waiter
5 Peter AAA unknown