Search code examples
pythonpandasmulti-index

python reindex doesn't show column values


I'm working with python 3.7 and pandas in JupyterLab and trying to use reindex to add missing rows in a MultiIndex table (index should be a product with the elements from Col1 and Col2).

df2 = pd.DataFrame({'Col1': [10, 11, 12, 12, 12, 11,],
                   'Col2': ['2012', '2012', '2013', '2014', '2015', '2012'],
                   'result': [1, 1, 1, 1, 1, 1]})

start, end = df2['Col1'].min(), df2['Col1'].max()
Col2_values = df2['Col2'].sort_values().unique()
newIndex = pd.MultiIndex.from_product( [range(start, end+1), Col2_values ], names=['Col1', 'Col2'],)
 
df2 = df2.reindex( newIndex, columns=['result'] )
print(df2)

What it shows is:

           result
Col1 Col2        
10   2012     NaN
     2013     NaN

Could anybody tell me how I can preserve the values of the column result?


Solution

  • Use DataFrame.set_index for MultiIndex from col1 and col2 for reindex by MultiIndex, else reindex use default RangeIndex (0-6), no value match and get all NaNs is output column:

    #last year is 2016
    df2 = pd.DataFrame({'Col1': [10, 11, 12, 12, 12, 11,],
                       'Col2': ['2012', '2012', '2013', '2014', '2015', '2016'],
                       'result': [1, 1, 1, 1, 1, 1]})
    
    start, end = df2['Col1'].min(), df2['Col1'].max()
    Col2_values = df2['Col2'].sort_values().unique()
    newIndex = pd.MultiIndex.from_product( [range(start, end+1), Col2_values ], 
                                           names=['Col1', 'Col2'],)
    print (newIndex) 
    MultiIndex([(10, '2012'),
                (10, '2013'),
                (10, '2014'),
                (10, '2015'),
                (10, '2016'),
                (11, '2012'),
                (11, '2013'),
                (11, '2014'),
                (11, '2015'),
                (11, '2016'),
                (12, '2012'),
                (12, '2013'),
                (12, '2014'),
                (12, '2015'),
                (12, '2016')],
               names=['Col1', 'Col2'])
    

    df2 = df2.set_index(['Col1','Col2']).reindex( newIndex, columns=['result'] )
    print(df2)
               result
    Col1 Col2        
    10   2012     1.0
         2013     NaN
         2014     NaN
         2015     NaN
         2016     NaN
    11   2012     1.0
         2013     NaN
         2014     NaN
         2015     NaN
         2016     1.0
    12   2012     NaN
         2013     1.0
         2014     1.0
         2015     1.0
         2016     NaN