I am using groupby
and agg
to summarize groups of dataframe rows.
I summarize each group in terms of its count
and size
:
>>> import pandas as pd
>>> df = pd.DataFrame([
[ 1, 2, 3 ],
[ 2, 3, 1 ],
[ 3, 2, 1 ],
[ 2, 1, 3 ],
[ 1, 3, 2 ],
[ 3, 3, 3 ] ],
columns=['A','B','C'] )
>>> gbB = df.groupby('B',as_index=False)
>>> Cagg = gbB.C.agg(['count','size'])
B count size
0 1 1 1
1 2 2 2
2 3 3 3
The result looks like a dataframe with columns for the
grouping variable B
and for the summaries count
and size
:
>>> Cagg.columns
Index(['B', 'count', 'size'], dtype='object')
However, I can't access each of the count
and size
columns
for further manipulation as series or by conversion to_list
:
>>> Cagg.count
<bound method DataFrame.count of B count size
0 1 1 1
1 2 2 2
2 3 3 3>
>>> Cagg.size
9
Can I access the individual column-like data with headings count
and size
?
Don't use attributes to access the columns, this conflicts with the existing methods/properties.
Go with indexing using square brackets:
Cagg['count']
# 0 1
# 1 2
# 2 3
# Name: count, dtype: int64
Cagg['size']
# 0 1
# 1 2
# 2 3
# Name: size, dtype: int64