Search code examples
pandasnumpyconditional-statementsdefaultcalculated-columns

returning a column value from a numpy conditional logic in the default section


I have a dataframe along the lines of the below:

   COLOR      
1  grey       
2  white      
3  black      
4  orange     
5  pink       
6  red        
7  blue       
8  green      

I want to add another column to the dataframe which sets a new color column that can assign values based on a specific condition and maintaining the already assigned color if a condition is not satisfied. Here's what I want as a final result:

       COLOR      FINAL_COLOR
    1  grey       black
    2  white      black
    3  black      black
    4  orange     red
    5  pink       red
    6  red        red
    7  blue       blue
    8  green      green

Here's what I tried:

condition = [(df['color'] == 'grey'),
             (df['color'] == 'white'),
             (df['color'] == 'orange'),
             (df['color'] == 'pink')]
result = ['black','red']
df['final_color'] = np.select(condition,result,default = df['color'])
         

Solution

  • This does what you're asking.

    def get_final_color(val):
        color = val.lower()
        if color in ['grey', 'white', 'black']:
            return 'black'
        elif color in ['orange', 'pink', 'red']:
            return 'red'
        else:
            return color
    
    df = pd.DataFrame(
        data = {'COLOR': ['grey', 'white', 'black', 'orange', 'pink', 'red', 'blue', 'green']})
    df['FINAL_COLOR'] = df['COLOR'].apply(get_final_color)
    

    Depending on what you're trying to do, this could be a good solution or please give more details so I can modify this answer to be more helpful!