Search code examples
python-3.xpandasdataframesklearn-pandas

How to call a function on pandas dataframe with multiple argument


I would like to define a function which will be applied to a dataframe whenever it will be called for a specific columns. I don't want to hard code the column names while defining the funtion. Below is my sample code. The lambda function may be complex one but I am trying with a simple one

def add(X, **args):
  for arg in args:
    X[arg].apply(lambda x: x + 10)
  return X

But if I call this function on my function like below I am getting error though I have these columns in my dataframe.

y = add(df_final['ABC', 'XYZ'])

KeyError: ('ABC', 'XYZ')

also I tried calling like below

y = add(df_final, ['ABC', 'XYZ'])

TypeError: add() takes 1 positional argument but 2 were given

It seems that I am missing some basic things here. How to modify the above code to make it working?


Solution

  • The **args definition implies a dict like object to be passed to add. You need to use *args if you want to pass an arbitrary number of value arguments after your mandatory X argument.

    In your func you also need to assign the new column to the dataframe, so that it gets saved. So, given

    def add(X, *args):
       for arg in args:
          X[arg] = X[arg].apply(lambda x: x + 10)
       return X
    

    You will get the following:

    >>> df
        a   b  ABC  XYZ
    0   1   1    6    1
    1  34  34    5    2
    2  34  34    4    4
    3  34  34    3    5
    4   d  23    2    6
    5   2   2    1    7
    
    df = add(df, *['ABC','XYZ'])
    
    >>> df
        a   b  ABC  XYZ
    0   1   1   16   11
    1  34  34   15   12
    2  34  34   14   14
    3  34  34   13   15
    4   d  23   12   16
    5   2   2   11   17