Search code examples
pythonpython-3.xpandasdataframebinning

Applying my custom function to a data frame python


I have a dataframe with a column called Signal. I want to add a new column to that dataframe and apply a custom function i've built. I'm very new at this and I seem to be having trouble when it comes to passing values that I'm getting out of a data frame column into a function so any help as to my syntax errors or reasoningg would be greatly appreciated!

Signal
3.98
3.78
-6.67
-17.6
-18.05
-14.48
-12.25
-13.9
-16.89
-13.3
-13.19
-18.63
-26.36
-26.23
-22.94
-23.23
-15.7

This is my simple function

def slope_test(x):
    if x >0 and x<20:
        return 'Long'
    elif x<0 and x>-20:
        return 'Short'
    else:
        return 'Flat'

I keep getting this error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Here is the code i've tried:

data['Position'] = data.apply(slope_test(data['Signal']))

and also:

data['Position'] = data['Signal'].apply(slope_test(data['Signal']))

Solution

  • You can use numpy.select for a vectorised solution:

    import numpy as np
    
    conditions = [df['Signal'].between(0, 20, inclusive=False),
                  df['Signal'].between(-20, 0, inclusive=False)]
    
    values = ['Long', 'Short']
    
    df['Cat'] = np.select(conditions, values, 'Flat')
    

    Explanation

    You are attempting to perform operations on a series as if it were a scalar. This won't work for the reason explained in your error. In addition, your logic for pd.Series.apply is incorrect. This method takes a function as an input. Therefore, you can simply use df['Signal'].apply(slope_test).

    But pd.Series.apply is a glorified, inefficient loop. You should utilise the vectorised functionality available with NumPy arrays underlying your Pandas dataframe. In fact, this a good reason for using Pandas in the first place.