I have a dataframe with (in this example) 2 rows and the dataframe looks a bit like this:
PERON START END
AB 100 120
CC 110 115
(What I want , but which is not the question I have, is to make a new column with a flag for each row if "START"-'END' is equal to 20) (I also had similar problems in other use-cases)
(I tried
df['New']= df.apply(lambda x: 1 if abs(df.START-df.END)==20 else 0
I also tried:
for i in df.iterrows():
print i[0]
if abs(df.START-df.END)==20:
print 'Legit to make be a flag'
Problem: Each time I get the message "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."
I understand that if you do not go through each row python doesn't know to which value to compare the "if-statement", therefore you use a.any() if you want to give a flag if at least 1 row does satify the condition of the if-statement. a.all() if all rows should match the row-condition...
Now, my question: Why is python not checking the if-statement on row-basis (certainly in the for-loop) and keeps it giving me the "The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."-Error
Other formulation of the question: Why is python in the "for i in df.iterrows():" still executing the if-statement on the whole dataframe-level instead on row-by-row basis.
Because in the code below it prints that the for loop is in row '0' so I would assume python would do the if statement on row '0'. But it does not check the if statement for this row because it gives me the "Ambiguous thruth value error"...
for i in df.iterrows():
print i[0]
I've looked at the "using-if-truth-statements-with-pandas"-documentation and some other stackoverflow questions but I can't figure out where I go wrong (highlikely because I'm still in the python-noob-phase).
This is down to how pandas vectorizes operations where it can.
abs(df.START-df.END) ==20
itself returns a series. This series is the column you seek:
>>> df = pd.DataFrame([[100,120],[110,115]],index=['AB','CC'],columns=['START','END'])
... df
... abs(df.START-df.END)==20
13: AB True
CC False
dtype: bool
Observe:
>>> df['New'] = abs(df.START-df.END)==20
>>> df
15: START END New
AB 100 120 True
CC 110 115 False
pandas is designed for this kind of operation.
You should not be disheartened, as both of your attempts were pretty close:
df['New']= df.apply(lambda x: 1 if abs(df.START-df.END)==20 else 0)
is broken, because abs(df.START-df.END)==20 returns a Series. For each row, that lambda is asking if a Series is True, hence the error.
If instead, you had:
df['New']= df.apply(lambda x: 1 if abs(x.START-x.END)==20 else 0, axis =1)
this would produce the desired result, because abs(x.START-x.END)==20 returns a boolean, not a series.