Search code examples
pythonloopsvalueerror

ValueError when trying to write a for loop in python


When I run this:

import pandas as pd

data = {'id': ['earn', 'earn','lose', 'earn'],
    'game': ['darts', 'balloons', 'balloons', 'darts']
    }

df = pd.DataFrame(data)
print(df)
print(df.loc[[1],['id']] == 'earn')

The output is:
id game
0 earn darts
1 earn balloons
2 lose balloons
3 earn darts
id
1 True

But when I try to run this loop:

for i in range(len(df)):  
     if (df.loc[[i],['id']] == 'earn'):  
         print('yes')  
     else:  
         print('no')

I get the error 'ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().' I am not sure what the problem is. Any help or advice is appreciated -- I am just starting.

I expected the output to be 'yes' from the loop. But I just got the 'ValueError' message. But, when I run the condition by itself, the output is 'True' so I'm not sure what is wrong.


Solution

  • Its complicated. pandas is geared towards operating on entire groups of data, not individual cells. df.loc may create a new DataFrame, a Series or a single value, depending on how its indexed. And those produce DataFrame, Series or scalar results for the == comparison.

    If the indexers are both lists, you get a new DataFrame and the compare is also a dataframe

    >>> foo = df.loc[[1], ['id']]
    >>> type(foo)
    <class 'pandas.core.frame.DataFrame'>
    >>> foo
         id
    1  earn
    >>> foo == "earn"
         id
    1  True
    

    If one indexer is scalar, you get a new Series

    >>> foo = df.loc[[1], 'id']
    >>> type(foo)
    <class 'pandas.core.series.Series'>
    >>> foo
    1    earn
    Name: id, dtype: object
    >>> foo == 'earn'
    1    True
    Name: id, dtype: bool
    

    If both indexers are scalar, you get a single cell's value

    >>> foo = df.loc[1, 'id']
    >>> type(foo)
    <class 'str'>
    >>> foo
    'earn'
    >>> foo == 'earn'
    True
    

    That last is the one you want. The first two produce containers where True is ambiguous (you need to decide if any or all values need to be True).

    for i in range(len(df)):  
         if (df.loc[i,'id'] == 'earn'):  
             print('yes')  
         else:  
             print('no')
    

    Or maybe not. Depending on what you intend to do next, create a series of boolean values for all of the rows at once

    >>> earn = df[id'] == 'earn'
    >>> earn
    0     True
    1     True
    2    False
    3     True
    Name: id, dtype: bool
    

    now you can continue to make calculations on the dataframe as a whole.