Search code examples
pythonpandasdataframevalueerror

How to resolve ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()


Here's a sample of the column of my dataset on which I am working now:

print (data)
     Credit Days
0             30
1   Cash & Carry
2   Cash & Carry
3             20
4             20
5             30
6             15
7             10
8             15
9   Cash & Carry
10            10
11            10
12            21
13  Cash & Carry
14            20
15            20

So this column contains both string and integer values. I have to convert these values to integer ratings and have to save them to a newly created column,say, credit_days_rating. For that I wrote a code:

data = pd.read_csv('test.csv', engine='python')

data['Credit Days'].astype(str)
if data['Credit Days']=='Cash & Carry':
    data['credit_days_rating'] = 4
else :
    data['Credit Days'].astype(int)
    if (data['Credit Days']>= 10) & (data['Credit Days']< 19):
        data['credit_days_rating'] = 3
    elif (data['Credit Days']>= 20) & (data['Credit Days']< 29):
        data['credit_days_rating'] = 2 
    else :
        data['credit_days_rating'] = 1 

For that I am getting the following error log:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-65-f6ecf070a2d4> in <module>()
      2 
      3 data['Credit Days'].astype(str)
----> 4 if (data['Credit Days']=='Cash & Carry'):
      5     data['credit_days_rating'] = 5
      6 else :

~/anaconda3/envs/tensorflow/lib/python3.5/site-packages/pandas/core/generic.py in __nonzero__(self)
   1119         raise ValueError("The truth value of a {0} is ambiguous. "
   1120                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1121                          .format(self.__class__.__name__))
   1122 
   1123     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

the new column should look like this:

enter image description here


Solution

  • You can use numpy.select for set values by list of conditions, for compare numeric values use to_numeric with errors='coerce' for converting non numeric to NaNs:

    m1 = data['Credit Days']=='Cash & Carry'
    
    s = pd.to_numeric(data['Credit Days'], errors='coerce')
    m2 = (s>= 10) & (s< 19)
    m3 = (s>= 20) & (s< 29)
    masks = [m1,m2,m3]
    vals = [4,3,2]
    data['credit_days_rating'] = np.select(masks, vals, default=1)
    print (data)
         Credit Days  credit_days_rating
    0             30                   1
    1   Cash & Carry                   4
    2   Cash & Carry                   4
    3             20                   2
    4             20                   2
    5             30                   1
    6             15                   3
    7             10                   3
    8             15                   3
    9   Cash & Carry                   4
    10            10                   3
    11            10                   3
    12            21                   2
    13  Cash & Carry                   4
    14            20                   2
    15            20                   2