Search code examples
pythondataframefunctionconditional-statementsmultiple-columns

How to define a function based on vaues of multiple columns


I have a datafile as follows:

Activity Hazard Condition consequence
welding fire chamber high
painting falling none medium

I need to create a fifth column based on the values in the activity, hazard, condition, and consequence columns. The conditions are as follows

if the activity column includes "working" or "performing" return 'none' if the condition column includes 'none' return 'none' else return datafile.Hazard.map(str) + " is " + datafile.Condition.map(str) + " impact " datafile.consequence.map(str)

I wrote the following code using regular expressions and dictionaries. But it didn't provide an answer. It is really appreciated if someone can give an answer

Dt1 = ['working','performing duties']
Dt2 = ["none"]

Dt1_regex = "|".join(Dt1)
Dt2_regex = "|".join(Dt2)

def clean(x):
    if datafile.Activity.str.contains (Dt1_regex, regex=True) | datafile.Condition.str.contains(Dt2_regex, regex=True):
        return 'none'
    else:
        return datafile.Hazard.map(str) + " is " + datafile.Condition.map(str)" impact " datafile.consequence.map(str)

datafile['combined'] = datafile.apply(clean)

Solution

  • You can create a bunch of conditions, and a list of the associated values when the conditions are true. You can then create the new column as shown in the example below:

    Conditions:

    • When Activity == 'working' OR Activity == 'performing' - set to none
    • When Condition == 'none' - set to none

    Otherwise set to:

    df.Hazard + ' is ' + df.Condition + ' impact ' + df.consequence

    Code:

    import numpy as np
    import pandas as pd
    
    df = pd.DataFrame({ 'Activity': ['welding', 'painting', 'working'],
                        'Hazard': ['fire', 'falling', 'drowning'],
                        'Condition': ['chamber', 'none', 'wet'],
                        'consequence': ['high', 'medium', 'high']})
    
    # Create your conditions
    conditions = [
        (df['Activity'] == 'working') | (df['Activity'] == 'performing'),
        (df['Condition'] == 'none')
        ]
    
    # create a list of the values we want to assign for each condition
    values = ['none', 'none']
    
    # Create the new column based on the conditions and values
    df['combined'] = np.select(conditions, values, default = "x")
    df.loc[df['combined'] == 'x', 'combined'] = df.Hazard + ' is ' + df.Condition + " impact " + df.consequence
    
    print(df)
    

    Output:

    Dataframe