I want to calculate the rate_death percentage as below - (new_deaths / population) * 100 after grouping by location and summing new_deaths.
Example: for Afghanistan, rate_death must calculate as ((1+4+10) / 38928341) * 100 And for Albania, it must calculate as ((0+0+1) / 2877800) * 100
Below is the data and approaches which I tried but not working -
df_data
location date new_cases new_deaths population 0 Afghanistan 4/25/2020 70 1 38928341 1 Afghanistan 4/26/2020 112 4 38928341 2 Afghanistan 4/27/2020 68 10 38928341 3 Albania 4/25/2020 15 0 2877800 4 Albania 4/26/2020 34 0 2877800 5 Albania 4/27/2020 14 1 2877800
Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 location 6 non-null object 1 date 6 non-null object 2 new_cases 6 non-null int64 3 new_deaths 6 non-null int64 4 population 6 non-null int64
Approach 1:
df_res = df_data[['location','new_deaths','population']].groupby(['location']).sum()
location new_deaths population Afghanistan 15 116785023 Albania 1 8633400
df_res['rate_death'] = (df_res['new_deaths'] / df_res['population'] * 100.0)
location new_deaths population rate_death Afghanistan 15 116785023 0.000 Albania 1 8633400 0.000
I know that the population is summing up twice due to the above groupby with 'sum' operation, but still I wonder why is the rate_death not calculating the percentage as expected but rather showing as 0.000
Approach 2: (tried as mentioned in this post - Pandas percentage of total with groupby)
location_population = df_data.groupby(['location', 'population']).agg({'new_deaths': 'sum'})
location = df_data.groupby(['location']).agg({'population': 'mean'})
location_population.div(location, level='location') * 100
location population new_deaths population Afghanistan 38928341 NaN NaN Albania 2877800 NaN NaN
But it is coming as NaN.
Please help if anything wrong in these approaches or how to resolve this. Thanks!
You can do -
df = df.groupby(['location']).agg({'new_deaths': sum, 'population': max})
df['rate_death'] = df['new_deaths'] / df['population'] * 100
Result
new_deaths population rate_death
location
Afghanistan 15 38928341 0.000039
Albania 1 2877800 0.000035