Search code examples
pythonpandasdataframemapping

Map two mixed columns with another dataframe


I have a dataframe where two columns, User and Code, have their values mixed or not correctly filled.

df = pd.DataFrame({"User":['Dan', 'Mary','003','010','Luke','Peter'],"Code":['001','035','003','Martin','Luke','AAA'],"Job":['astronaut','biologist','director','footballer','waiter']})

df

User   Code   Job
Dan    001    astronaut
Mary   035    biologist
003    003    director
010   Martin  footballer
Luke   Luke   waiter
Peter  AAA    unkown

Given that I have a second dataframe - dictionary with the mapping of users and codes:

df_dict = pd.DataFrame({"name":['Dan','Paul','Julia','Mary','Martin','George','Daniel','Luke','Marina'],"code":['001','045','012','035','010','003','200','501']})

df_dict

name     code
Dan       001
Paul      045
Julia     012
Mary      035
Martin    010
George    003
Daniel    200
Marina    501

how can I do the mapping so that the first dataframe df looks like:

User   Code   Job
Dan    001    astronaut
Mary   035    biologist
George 003    director
Martin 010    footballer
Luke   200    waiter
Peter  AAA    unknown

Notice that if a value is not in the df_dict (like 'Peter' in this example) the record should remain as in the original df


Solution

  • Create Series for mapping by first columns in Series.map with replace possible missing values by map by another columns, reasign columns in DataFrame.assign and replace missing values by original values by DataFrame.fillna:

    d1 = df_dict.set_index('name')['code']
    d2 = df_dict.set_index('code')['name']
    
    c = df['User'].map(d1).fillna(df['Code'].map(d1))
    u = df['Code'].map(d2).fillna(df['User'].map(d2))
    
    df = df.assign(Code=c, User=u).fillna(df)
    print (df)
         User Code         Job
    0     Dan  001   astronaut
    1    Mary  035   biologist
    2  George  003    director
    3  Martin  010  footballer
    4    Luke  200      waiter
    5   Peter  AAA     unknown