I have a dataframe which looks like this:
dfB
name value country
benzene spice Australia
benzene spice Australia
benzene spice Australia
benzene herbs Australia
benzene herbs Americas
benzene anise Poland
methyl herbs
methyl herbs Americas
methyl spice Americas
alcohol spice Germany
alcohol spice Germany
I want to createa a different dataframe which is an aggregation of the country column, something like this:
dfB
name value country count
benzene spice Australia 3
benzene herbs Australia 1
benzene herbs Americas 1
benzene anise Poland 1
methyl herbs 1
methyl herbs Americas 1
methyl spice Americas 1
alcohol spice Germany 2
The idea is to aggregare the country column and create a count for the country column for each unique "name" and "value" combination. If there is a blank or Nan it should also he treated differently.
I tried using groupby:
grouped = dfB.groupby(["name", "value", "country"]).agg({"country": "count"})
but it does not seem to create the dataframe how I intend to. How can I do this?
Use value_counts
or groupby
to not modify order:
out = dfB.value_counts(["name", "value", "country"], sort=False, dropna=False) \
.rename('count').reset_index()
out.loc[out['country'].isna(), 'count'] = 1
out1 = dfB.groupby(["name", "value", "country"], sort=False, dropna=False) \
.size().reset_index(name='count')
out1.loc[out1['country'].isna(), 'count'] = 1
>>> out
name value country count
0 alcohol spice Germany 2
1 benzene anise Poland 1
2 benzene herbs Americas 1
3 benzene herbs Australia 1
4 benzene spice Australia 3
5 methyl herbs Americas 1
6 methyl herbs NaN 1
7 methyl spice Americas 1
>>> out1
name value country count
0 benzene spice Australia 3
1 benzene herbs Australia 1
2 benzene herbs Americas 1
3 benzene anise Poland 1
4 methyl herbs NaN 1
5 methyl herbs Americas 1
6 methyl spice Americas 1
7 alcohol spice Germany 2