Python: How to pad with zeros?

Assuming we have a dataframe as below:

df = pd.DataFrame({ 'Col1' : ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c'],
        'col2' : ['0.5', '0.78', '0.78', '0.4', '2', '9', '2', '7',]
        })

I counted the number of rows for all the unique values in col1. Like a has 4 rows, b and c have 2 rows each, by doing:

df.groupby(['Col1']).size()

and I get the output as

Col1
a    4
b    2
c    2
dtype: int64

After this is done, I would like to check which among a, b, c has the maximum number of rows (in this case, a has the maximum rows) and pad the others (b and c) with the difference between the the maximum value and the rows they have, with zeros (both b and c have 2 rows each, and since 4 is the maximum number of rows, I want to pad b and c with 2 more zeros). The zeros must be added at the end.

I want to pad it with zeros since I want to apply a window of fixed size on all the variables (a, b, c) to plot graphs.

Solution

You can create counter by GroupBy.cumcount, create MultiIndex and DataFrame.reindex by all combinations created by MultiIndex.from_product:

df1 = df.set_index(['Col1', df.groupby('Col1').cumcount()])

mux = pd.MultiIndex.from_product(df1.index.levels, names=df1.index.names)
df2 = df1.reindex(mux, fill_value=0).reset_index(level=1, drop=True).reset_index()
print (df2)
   Col1  col2
0     a   0.5
1     a  0.78
2     a  0.78
3     a   0.4
4     b     2
5     b     9
6     b     0
7     b     0
8     c     2
9     c     7
10    c     0
11    c     0