I would like to define a function which will be applied to a dataframe whenever it will be called for a specific columns. I don't want to hard code the column names while defining the funtion. Below is my sample code. The lambda function may be complex one but I am trying with a simple one
def add(X, **args):
for arg in args:
X[arg].apply(lambda x: x + 10)
return X
But if I call this function on my function like below I am getting error though I have these columns in my dataframe.
y = add(df_final['ABC', 'XYZ'])
KeyError: ('ABC', 'XYZ')
also I tried calling like below
y = add(df_final, ['ABC', 'XYZ'])
TypeError: add() takes 1 positional argument but 2 were given
It seems that I am missing some basic things here. How to modify the above code to make it working?
The **args
definition implies a dict like object to be passed to add. You need to use *args
if you want to pass an arbitrary number of value arguments after your mandatory X
argument.
In your func you also need to assign the new column to the dataframe, so that it gets saved. So, given
def add(X, *args):
for arg in args:
X[arg] = X[arg].apply(lambda x: x + 10)
return X
You will get the following:
>>> df
a b ABC XYZ
0 1 1 6 1
1 34 34 5 2
2 34 34 4 4
3 34 34 3 5
4 d 23 2 6
5 2 2 1 7
df = add(df, *['ABC','XYZ'])
>>> df
a b ABC XYZ
0 1 1 16 11
1 34 34 15 12
2 34 34 14 14
3 34 34 13 15
4 d 23 12 16
5 2 2 11 17