Search code examples
pythonpandasapplygetattr

How to use pandas.DataFrame.apply with getattr function in Python


Suppose I'd like to remove '$' signs from my dataframe in Pandas. And I have created a class called TransformFunctions so that I can use getattr() to invoke function from that class (the reason being that I am using another JSON file in which I will list the method names associated with columns in the data to do the processing; because JSON only accepts strings, I decided to invoke methods based on the string using a suggestion given here).

The code is as below:

class TransformFunctions(object):
    def remove_dollar(self, cell_str):
        return float(cell_str.replace("$", "").replace(",", ""))

data = {
    'dpt':[868, 868, 69],
    'name':['B J SANDIFORD', 'C A WIGFALL', 'A E A-AWOSOGBA'],
    'address':['  DEPARTMENT OF CITYWIDE ADM', 'DEPARTMENT OF CITYWIDE ADM  ', ' HRA/DEPARTMENT OF SOCIAL S '],
    'ttl#':['12702', '12702', '52311'],
    'pc':[' X ',' X', 'A '],
    'sal-rate':['$5.00', '$5.00', '$51,955.00']
}
df = pd.DataFrame(data)
klass = TransformFunctions()
df['sal-rate'] = df['sal-rate'].apply(getattr(klass,'remove_dollar')()) ## here, I get TypeError: remove_dollar() missing 1 required positional argument: 'cell_str'

I'd like to know how to use apply from pandas.DataFrame to invoke methods via getattr if possible. Thank you in advance for your suggestions/answers!


Solution

  • The reason is getattr returns method remove_dollar and you called it inside of apply without parameter when you put () at the end of getattr(...). You should do this (i.e. remove ()):

    df['sal-rate'] = df['sal-rate'].apply(getattr(klass,'remove_dollar'))
    
    Out[952]:
                            address  dpt            name   pc  sal-rate   ttl#
    0    DEPARTMENT OF CITYWIDE ADM  868   B J SANDIFORD   X        5.0  12702
    1  DEPARTMENT OF CITYWIDE ADM    868     C A WIGFALL    X       5.0  12702
    2   HRA/DEPARTMENT OF SOCIAL S    69  A E A-AWOSOGBA   A    51955.0  52311
    

    Besides, why don't you call apply using directly klass.remove_dollar such as:

    df['sal-rate'].apply(klass.remove_dollar)
    
    Out[955]:
    0        5.0
    1        5.0
    2    51955.0
    Name: sal-rate, dtype: float64