I have a dataframe with 3 columns. The 1st column named "insured_relationship" takes the values ['own_child', 'wife', 'husband', 'unmarried', 'other_relationship']. The 2nd column, named "insured_sex", takes the values ['Male', 'Female']. The third one, named "incident_hour_of_the_day", takes the integer values [0,1,2,3,.....,23]. I created a function (with the name mar_status()) with some conditions on the 3 columns in order to create a new column(variable) named "marital_status" in the dataframe. But when I use the "apply" method on the 3 columns I get the following error message regarding my function :
TypeError: mar_status() got multiple values for argument 'col1'
I must indicate that col1 is the "insured_relationship" column of my dataframe.
This is the function which I created:
def mar_status(col1,col2,col3):
if col1 == 'own_child' and col3 in range(13) :
return 'unmarried'
elif col1 == 'own_child' and col3 in range(13,24) :
return 'divorced'
elif col1 == 'wife' and col2 == 'Male' :
return 'married'
elif col1 == 'husband' and col2 == 'Female' :
return 'married'
elif col1 == 'unmarried':
return 'unmarried'
elif col1 == 'other_relationship':
return 'in relationship'
elif col1 == 'out_of_family' and col2 == 'Male':
return 'widower'
elif col1 == 'out_of_family' and col2 == 'Female' :
return 'widow'
And the "apply" method:
df['marital_status'] = df.apply(mar_status,col1 ='insured_relationship',col2 ='insured_sex',
col3 = 'incident_hour_of_the_day',axis = 1)
I expected to create a new variable named "marital_status" which takes the values : ['unmarried', 'divorced', 'married', in_relationship', 'widow', 'widower'].
The function itself works but when I apply it to the dataframe doesn't. How can I achieve the desired outcome?
Try it like this:
df['marital_status'] = df.apply(lambda x: mar_status(x['insured_relationship'], x['insured_sex'], x['incident_hour_of_the_day']), axis=1)
insured_relationship | insured_sex | incident_hour_of_the_day | marital_status |
---|---|---|---|
own_child | Female | 2 | unmarried |