python-3.x pandas dataframe sklearn-pandas

How to call a function on pandas dataframe with multiple argument

I would like to define a function which will be applied to a dataframe whenever it will be called for a specific columns. I don't want to hard code the column names while defining the funtion. Below is my sample code. The lambda function may be complex one but I am trying with a simple one

def add(X, **args):
  for arg in args:
    X[arg].apply(lambda x: x + 10)
  return X

But if I call this function on my function like below I am getting error though I have these columns in my dataframe.

y = add(df_final['ABC', 'XYZ'])

KeyError: ('ABC', 'XYZ')

also I tried calling like below

y = add(df_final, ['ABC', 'XYZ'])

TypeError: add() takes 1 positional argument but 2 were given

It seems that I am missing some basic things here. How to modify the above code to make it working?

Solution

The **args definition implies a dict like object to be passed to add. You need to use *args if you want to pass an arbitrary number of value arguments after your mandatory X argument.

In your func you also need to assign the new column to the dataframe, so that it gets saved. So, given

def add(X, *args):
   for arg in args:
      X[arg] = X[arg].apply(lambda x: x + 10)
   return X

You will get the following:

>>> df
    a   b  ABC  XYZ
0   1   1    6    1
1  34  34    5    2
2  34  34    4    4
3  34  34    3    5
4   d  23    2    6
5   2   2    1    7

df = add(df, *['ABC','XYZ'])

>>> df
    a   b  ABC  XYZ
0   1   1   16   11
1  34  34   15   12
2  34  34   14   14
3  34  34   13   15
4   d  23   12   16
5   2   2   11   17