Search code examples
pythonpandasnumpyzero-padding

Python: How to pad with zeros?


Assuming we have a dataframe as below:

df = pd.DataFrame({ 'Col1' : ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c'],
        'col2' : ['0.5', '0.78', '0.78', '0.4', '2', '9', '2', '7',]
        })

I counted the number of rows for all the unique values in col1. Like a has 4 rows, b and c have 2 rows each, by doing:

df.groupby(['Col1']).size()

and I get the output as

Col1
a    4
b    2
c    2
dtype: int64

After this is done, I would like to check which among a, b, c has the maximum number of rows (in this case, a has the maximum rows) and pad the others (b and c) with the difference between the the maximum value and the rows they have, with zeros (both b and c have 2 rows each, and since 4 is the maximum number of rows, I want to pad b and c with 2 more zeros). The zeros must be added at the end.

I want to pad it with zeros since I want to apply a window of fixed size on all the variables (a, b, c) to plot graphs.


Solution

  • You can create counter by GroupBy.cumcount, create MultiIndex and DataFrame.reindex by all combinations created by MultiIndex.from_product:

    df1 = df.set_index(['Col1', df.groupby('Col1').cumcount()])
    
    mux = pd.MultiIndex.from_product(df1.index.levels, names=df1.index.names)
    df2 = df1.reindex(mux, fill_value=0).reset_index(level=1, drop=True).reset_index()
    print (df2)
       Col1  col2
    0     a   0.5
    1     a  0.78
    2     a  0.78
    3     a   0.4
    4     b     2
    5     b     9
    6     b     0
    7     b     0
    8     c     2
    9     c     7
    10    c     0
    11    c     0