I have been practicing with pandas and have the following problem. I have been practicing with an Olympic medal data set where the countries are listed in one column, and the medals in subsequent columns.
As part of a QC check, I wanted to write a line of code that checks that the sum of the gold (01 !), silver (02 !) and bronze (03 !) medals is equal to the value in the 'total' column. I have tried this a number of ways, including .apply with the function below, and also with .loc[] (see code below). However, I keep on getting errors back. Can someone explain where I am going wrong?
To expand, the idea of the code would be to select all rows where the sum is correct, and exclude those which are not. I have shown two of the methods I have tried below.
I am new to python and coding in general so apologies if my question is not clear enough.
Many thanks
Luke
df = pd.DataFrame({Country: ['Afghanistan', 'Algeria', 'Argentina'],
01 !: [0, 0, 1],
02 !: [5, 2, 8],
03 !: [18, 24, 28],
Total: [23, 26, 38]})
def medal_sum_check(gold, silver, bronze, total):
if gold + silver + bronze == total:
return True
else:
return False
df2 = df.loc[df.apply(medal_sum_check(df.iloc[1], df.iloc[2], df.iloc[3], df.iloc[4]))]
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
or
df2 = df.loc[(df.iloc[1] + df.iloc[2] + df.iloc[3] == df.iloc[4])]
ValueError: cannot reindex from a duplicate axis
You can directly add do the conditional check on the entire column in pandas. Here is a demonstration:
df = pd.DataFrame({"Country": ['Afghanistan', 'Algeria', 'Argentina'],
"01 !": [0, 0, 1],
"02 !": [5, 2, 8],
"03 !": [18, 24, 28],
"Total": [23, 26, 38]})
print(df["01 !"] + df["02 !"] + df["03 !"] == df["Total"])
Output:
0 True
1 True
2 False
You can also add the check output as a new column in your data frame by doing this:
df["check"] = df["01 !"] + df["02 !"] + df["03 !"] == df["Total"]
Then your DataFrame will look like this:
Country 01 ! 02 ! 03 ! Total check
0 Afghanistan 0 5 18 23 True
1 Algeria 0 2 24 26 True
2 Argentina 1 8 28 38 False