Search code examples
pythonpandasdataframenumpydata-analysis

How to replace a column value with a value from dictionary based on another column value


I have the following Data Frame:

ID    Unique_value    
red_1        0  
red_2        1
blue_1       1                
blue_2       2           
blue_3       2             
blue_4       3         
blue_4       1          

There are two dictionaries for mapping for red and blue ID's :

red_dict = {0: "A", 1: "B", 2: "C"}
blue_dict = {0: "D", 1: "B", 2: "E", 3: "F"}

Desired Output is:

ID    Unique_value    
red_1        A  
red_2        B
blue_1       B                
blue_2       E           
blue_3       E             
blue_4       F         
blue_4       B          

What I have done so far:

  1. Created two dictionaries.
  2. In a for loop, loop through each value in ID column, if the ID value is related to red, then replace the unique_value column with another value.
  3. I am stuck in how to replace that one value from a dictionary of values.

Solution

  • I would extract the key (red/blue/…) with a regex and use a dictionary of dictionaries and groupby.apply to map the values:

    mapper = {'red': {0: "A", 1: "B", 2: "C"},
              'blue': {0: "D", 1: "B", 2: "E", 3: "F"}}
    
    df['Unique_value'] = (df.groupby(df['ID'].str.extract('([^_]+)', expand=False),
                                     group_keys=False)['Unique_value']
                            .apply(lambda g: g.map(mapper.get(g.name, {})))
                         )
    

    Or using a list comprehension with zip:

    mapper = {'red': {0: "A", 1: "B", 2: "C"},
              'blue': {0: "D", 1: "B", 2: "E", 3: "F"}}
    
    df['Unique_value'] = [mapper.get(d, {}).get(v) for d, v in 
                          zip(df['ID'].str.extract('([^_]+)', expand=False),
                              df['Unique_value'])]
    

    If you expect a unique match for each row, it's also possible to craft a DataFrame, stack and merge:

    mapper = {'red': {0: "A", 1: "B", 2: "C"},
              'blue': {0: "D", 1: "B", 2: "E", 3: "F"}}
    
    df['Unique_value'] = df.merge(pd.DataFrame(mapper).stack().rename('X'),
                                  left_on=['Unique_value',
                                           df['ID'].str.extract('([^_]+)', expand=False)],
                                  right_index=True, how='left')['X'].values
    

    Output:

           ID Unique_value
    0   red_1            A
    1   red_2            B
    2  blue_1            B
    3  blue_2            E
    4  blue_3            E
    5  blue_4            F
    6  blue_4            B