Python: how to use result of groupby().size()

I have a Kickstarter Dataset and want to show the success rate in a plot. So there is a Column "Successful" which is either "Yes" or "No" and the different countries.

Succesfull=df[['country', 'succesfull']].groupby(['country', 'succesfull']).size()

with this line I created a new data frame which shows how many projects in a country were successful and how many not. It looks like that:

How can I now calculate the success rate ( Amount of "Yes" / Total Amount) of each country? how can I access the last column which doesn't have a name (there is just a 0).

Solution

This may not be the best approach, but this is how I split up the problem.

1) make a small reproducible example to test different cases

df = pd.DataFrame({'country': ['AU', 'AU', 'CA', 'CA', 'GB', 'GB'], 'successfull': ['Yes', 'No', 'No', 'No', 'Yes', 'Yes']})

df
Out[3]: 
  country successfull
0      AU         Yes
1      AU          No
2      CA          No
3      CA          No
4      GB         Yes
5      GB         Yes

2) counting successful projects

Matching up only the values with 'Yes' and counting them

success = df[df['successfull'] == 'Yes'].groupby('country').size()

success
Out[5]: 
country
AU    1
GB    2
dtype: int64

3) counting total projects

In a similar fashion to the question posted.

total = df.groupby('country').size()

total
Out[7]: 
country
AU    2
CA    2
GB    2
dtype: int64

4) Take the ratio of successful projects to total projects

Once these two series exist the ratio can be taken. If a country does not have a 'Yes', it will become NaN. Replace these with 0, because the success rate is 0 here.

Successfull = (success/total).replace(np.nan, 0)
Out[10]: 
country
AU    0.5
CA    0.0
GB    1.0
dtype: float64

5) accessing the data

The Successfull object above is a Pandas.Series. Below are a few examples for accessing. For example, to know how successful a country is, use the country code as the index:

Successfull['AU']
Out[11]: 0.5

To get an array of success ratios the .values attribute can be accessed.

Successfull.values
Out[12]: array([0.5, 0. , 1. ])

To plot the results

Successfull.plot()