i want to remove all the rows that equal to 0, so only rows that equal to 4, 1, 2 and 3 remain from the series below
my_series = df.groupby(by=['Continent','bins']).size()
my_series output:
Continent bins
Asia (2.212, 15.753] 4
(15.753, 29.227] 1
(29.227, 42.701] 0
(42.701, 56.174] 0
(56.174, 69.648] 0
Australia (2.212, 15.753] 1
(15.753, 29.227] 0
(29.227, 42.701] 0
(42.701, 56.174] 0
(56.174, 69.648] 0
Europe (2.212, 15.753] 1
(15.753, 29.227] 3
(29.227, 42.701] 2
(42.701, 56.174] 0
(56.174, 69.648] 0
North America (2.212, 15.753] 1
(15.753, 29.227] 0
(29.227, 42.701] 0
(42.701, 56.174] 0
(56.174, 69.648] 1
South America (2.212, 15.753] 0
(15.753, 29.227] 0
(29.227, 42.701] 0
(42.701, 56.174] 0
(56.174, 69.648] 1
dtype: int64
the result type must be series, not dataframe.
i tried "dataframe-like" approach like
df = df[df['bins']!=0]
but surely this didn't work out.
and same failure here(apparently it's not applyable to a multiindex series)
my_series.difference(list)
Because bins
has the category
dtype when you use groupby
, Pandas generates all combinations of bins
, that's why you have 0 in your result.
From documentation
observed bool, default False
This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.
First solution based on your code, use observed=True
:
my_series = df.groupby(['Continent','bins'], observed=True).size()
Second solution, more straightforward, use value_counts
:
my_series = df.value_counts(['Continent', 'bins'])