Search code examples
pythonpython-2.7pandas-datareader

Handling Value error in Python while performing Range binning


I am trying to classify pandas column values as range values. But I get value error when I use Bisect

from pandas_datareader import data
import pandas
import bisect
import fix_yahoo_finance as yf

yf.pdr_override() 
df = data.get_data_yahoo('SPY', '2015-01-01', '2018-04-05')
df.tail(2)

def Daily_Returns(A, B):
    return (B - A)*100/A

df['OC_Return_%'] = Daily_Returns(df['Open'], df['Close'])

def b(value):
    intervals = ['Less Than -10 %','-10% to -5%','-5% to -2.5%','-2.5% to -2%','-2% to -1.5%','-1.5% to -1%','-1% to -0.5%','-0.5% to 0%','0% to 0.5%','0.5% to 1%','1% to 1.5%','1.5% to 2%','2% to 2.5%','2.5% to 5%','5% to 10%','Greater Than 10 %']
    return intervals[bisect.bisect_left([-float('inf'),-10,-5,-2.5,-2,-1.5,-1,-0.5,0,0.5,1,1.5,2,2.5,5,10,float('inf')], value)-1]

df['OC_Return_Bin'] = b(df["OC_Return_%"])
df

The error disapears if I use a.any() or a.all(). But it is filling the result column with wrong values.

This is the entire trace back as requested in the comments.

ValueError                                Traceback (most recent call last)
<ipython-input-80-a571e502f6a6> in <module>()
 17     return intervals[bisect.bisect_left([-float('inf'),-10,-5,-2.5,-2,-1.5,-1,-0.5,0,0.5,1,1.5,2,2.5,5,10,float('inf')], value)-1]
 18 
 19 df['OC_Return_Bin'] = b(df["OC_Return_%"])
 20 df

<ipython-input-80-a571e502f6a6> in b(value)
 15 def b(value):
 16     intervals = ['Less Than -10 %','-10% to -5%','-5% to -2.5%','-2.5% to -2%','-2% to -1.5%','-1.5% to -1%','-1% to -0.5%','-0.5% to 0%','0% to 0.5%','0.5% to 1%','1% to 1.5%','1.5% to 2%','2% to 2.5%','2.5% to 5%','5% to 10%','Greater Than 10 %']
 17     return intervals[bisect.bisect_left([-float('inf'),-10,-5,-2.5,-2,-1.5,-1,-0.5,0,0.5,1,1.5,2,2.5,5,10,float('inf')], value)-1]
 18 
 19 df['OC_Return_Bin'] = b(df["OC_Return_%"])

C:\Users\USER\Anaconda2\lib\site-packages\pandas\core\generic.pyc in __nonzero__(self)
953         raise ValueError("The truth value of a {0} is ambiguous. "
954                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
955                          .format(self.__class__.__name__))
956 
957     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Solution

  • The problem is that your function "b" is not capable of working with series of values, it can only process single value. In order to fix it, you can either use DataFrame.apply, e.g. df['OC_Return_Bin'] = df["OC_Return_%"].apply(b) or make it capable of working with series.