I'm working on an assignment of Applied Data Science.
Question: Cut % Renewable into 5 bins. Group Top15 by the Continent, as well as these new % Renewable bins. How many countries are in each of these groups? This function should return a Series with a MultiIndex of Continent, then the bins for % Renewable. Do not include groups with no countries.
This is my code:
def answer_twelve():
Top15 = answer_one()
ContinentDict = {'China':'Asia',
'United States':'North America',
'Japan':'Asia',
'United Kingdom':'Europe',
'Russian Federation':'Europe',
'Canada':'North America',
'Germany':'Europe',
'India':'Asia',
'France':'Europe',
'South Korea':'Asia',
'Italy':'Europe',
'Spain':'Europe',
'Iran':'Asia',
'Australia':'Australia',
'Brazil':'South America'}
Top15['Continent'] = Top15.index.to_series().map(ContinentDict)
Top15['bins'] = pd.cut(Top15['% Renewable'],5)
return pd.Series(Top15.groupby(by = ['Continent', 'bins']).size())#,apply(lambda x:s if x['Rank']==0 continue))
answer_twelve()
This is my output for the above code
Continent bins
Asia (2.212, 15.753] 4
(15.753, 29.227] 1
(29.227, 42.701] 0
(42.701, 56.174] 0
(56.174, 69.648] 0
Australia (2.212, 15.753] 1
(15.753, 29.227] 0
(29.227, 42.701] 0
(42.701, 56.174] 0
(56.174, 69.648] 0
Europe (2.212, 15.753] 1
(15.753, 29.227] 3
(29.227, 42.701] 2
(42.701, 56.174] 0
(56.174, 69.648] 0
North America (2.212, 15.753] 1
(15.753, 29.227] 0
(29.227, 42.701] 0
(42.701, 56.174] 0
(56.174, 69.648] 1
South America (2.212, 15.753] 0
(15.753, 29.227] 0
(29.227, 42.701] 0
(42.701, 56.174] 0
(56.174, 69.648] 1
dtype: int64
Required output is
Continent bins
Asia (2.212, 15.753] 4
(15.753, 29.227] 1
Australia (2.212, 15.753] 1
Europe (2.212, 15.753] 1
(15.753, 29.227] 3
(29.227, 42.701] 2
North America (2.212, 15.753] 1
(56.174, 69.648] 1
South America (56.174, 69.648] 1
Name: Countries, dtype: int64
I want to get rid of the zeros, i tried using
pd.Series(Top15.groupby(by = ['Continent', 'bins']).size().apply(lambda x:s if x['Rank']==0 continue))
But I keep getting the following error
File "<ipython-input-317-14bc05bb2137>", line 20
return pd.Series(Top15.groupby(by = ['Continent', 'bins']).size().apply(lambda x:s if x['Rank']==0 continue))
^
SyntaxError: invalid syntax
I'm unable to figure out my mistake. Please help me!
Use pandas and just drop the rows when the column is zero
if column_name is your column:
df = df[df.column_name != 0]