I am trying to classify pandas column values as range values. But I get value error when I use Bisect
from pandas_datareader import data
import pandas
import bisect
import fix_yahoo_finance as yf
yf.pdr_override()
df = data.get_data_yahoo('SPY', '2015-01-01', '2018-04-05')
df.tail(2)
def Daily_Returns(A, B):
return (B - A)*100/A
df['OC_Return_%'] = Daily_Returns(df['Open'], df['Close'])
def b(value):
intervals = ['Less Than -10 %','-10% to -5%','-5% to -2.5%','-2.5% to -2%','-2% to -1.5%','-1.5% to -1%','-1% to -0.5%','-0.5% to 0%','0% to 0.5%','0.5% to 1%','1% to 1.5%','1.5% to 2%','2% to 2.5%','2.5% to 5%','5% to 10%','Greater Than 10 %']
return intervals[bisect.bisect_left([-float('inf'),-10,-5,-2.5,-2,-1.5,-1,-0.5,0,0.5,1,1.5,2,2.5,5,10,float('inf')], value)-1]
df['OC_Return_Bin'] = b(df["OC_Return_%"])
df
The error disapears if I use a.any() or a.all(). But it is filling the result column with wrong values.
This is the entire trace back as requested in the comments.
ValueError Traceback (most recent call last)
<ipython-input-80-a571e502f6a6> in <module>()
17 return intervals[bisect.bisect_left([-float('inf'),-10,-5,-2.5,-2,-1.5,-1,-0.5,0,0.5,1,1.5,2,2.5,5,10,float('inf')], value)-1]
18
19 df['OC_Return_Bin'] = b(df["OC_Return_%"])
20 df
<ipython-input-80-a571e502f6a6> in b(value)
15 def b(value):
16 intervals = ['Less Than -10 %','-10% to -5%','-5% to -2.5%','-2.5% to -2%','-2% to -1.5%','-1.5% to -1%','-1% to -0.5%','-0.5% to 0%','0% to 0.5%','0.5% to 1%','1% to 1.5%','1.5% to 2%','2% to 2.5%','2.5% to 5%','5% to 10%','Greater Than 10 %']
17 return intervals[bisect.bisect_left([-float('inf'),-10,-5,-2.5,-2,-1.5,-1,-0.5,0,0.5,1,1.5,2,2.5,5,10,float('inf')], value)-1]
18
19 df['OC_Return_Bin'] = b(df["OC_Return_%"])
C:\Users\USER\Anaconda2\lib\site-packages\pandas\core\generic.pyc in __nonzero__(self)
953 raise ValueError("The truth value of a {0} is ambiguous. "
954 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
955 .format(self.__class__.__name__))
956
957 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
The problem is that your function "b" is not capable of working with series of values, it can only process single value. In order to fix it, you can either use DataFrame.apply, e.g. df['OC_Return_Bin'] = df["OC_Return_%"].apply(b)
or make it capable of working with series.