Search code examples
pythonfunctionapplymultiple-columns

Problem with applying a function with some conditions on 3 columns of a python dataframe


I have a dataframe with 3 columns. The 1st column named "insured_relationship" takes the values ['own_child', 'wife', 'husband', 'unmarried', 'other_relationship']. The 2nd column, named "insured_sex", takes the values ['Male', 'Female']. The third one, named "incident_hour_of_the_day", takes the integer values [0,1,2,3,.....,23]. I created a function (with the name mar_status()) with some conditions on the 3 columns in order to create a new column(variable) named "marital_status" in the dataframe. But when I use the "apply" method on the 3 columns I get the following error message regarding my function :

TypeError: mar_status() got multiple values for argument 'col1'

I must indicate that col1 is the "insured_relationship" column of my dataframe.

This is the function which I created:

def mar_status(col1,col2,col3):
    if col1 == 'own_child' and col3 in range(13) :
        return 'unmarried'
    elif col1 == 'own_child' and col3 in range(13,24) :
        return 'divorced'
    elif col1 == 'wife' and col2 == 'Male' :
        return 'married'
    elif col1 == 'husband' and col2 == 'Female' :
        return 'married'
    elif col1 == 'unmarried':
         return 'unmarried'
    elif col1 == 'other_relationship':
        return 'in relationship'
    elif col1 == 'out_of_family' and col2 == 'Male':
        return 'widower'
    elif col1 == 'out_of_family' and col2 == 'Female' :
        return 'widow'

And the "apply" method:

df['marital_status'] = df.apply(mar_status,col1 ='insured_relationship',col2 ='insured_sex',
                                col3 = 'incident_hour_of_the_day',axis = 1)

I expected to create a new variable named "marital_status" which takes the values : ['unmarried', 'divorced', 'married', in_relationship', 'widow', 'widower'].

The function itself works but when I apply it to the dataframe doesn't. How can I achieve the desired outcome?


Solution

  • Try it like this:

    df['marital_status'] = df.apply(lambda x: mar_status(x['insured_relationship'], x['insured_sex'], x['incident_hour_of_the_day']), axis=1)
    
    insured_relationship insured_sex incident_hour_of_the_day marital_status
    own_child Female 2 unmarried