I'm working with python 3.7 and pandas in JupyterLab and trying to use reindex to add missing rows in a MultiIndex table (index should be a product with the elements from Col1 and Col2).
df2 = pd.DataFrame({'Col1': [10, 11, 12, 12, 12, 11,],
'Col2': ['2012', '2012', '2013', '2014', '2015', '2012'],
'result': [1, 1, 1, 1, 1, 1]})
start, end = df2['Col1'].min(), df2['Col1'].max()
Col2_values = df2['Col2'].sort_values().unique()
newIndex = pd.MultiIndex.from_product( [range(start, end+1), Col2_values ], names=['Col1', 'Col2'],)
df2 = df2.reindex( newIndex, columns=['result'] )
print(df2)
What it shows is:
result
Col1 Col2
10 2012 NaN
2013 NaN
Could anybody tell me how I can preserve the values of the column result?
Use DataFrame.set_index
for MultiIndex
from col1
and col2
for reindex
by MultiIndex
, else reindex
use default RangeIndex
(0-6
), no value match and get all NaN
s is output column:
#last year is 2016
df2 = pd.DataFrame({'Col1': [10, 11, 12, 12, 12, 11,],
'Col2': ['2012', '2012', '2013', '2014', '2015', '2016'],
'result': [1, 1, 1, 1, 1, 1]})
start, end = df2['Col1'].min(), df2['Col1'].max()
Col2_values = df2['Col2'].sort_values().unique()
newIndex = pd.MultiIndex.from_product( [range(start, end+1), Col2_values ],
names=['Col1', 'Col2'],)
print (newIndex)
MultiIndex([(10, '2012'),
(10, '2013'),
(10, '2014'),
(10, '2015'),
(10, '2016'),
(11, '2012'),
(11, '2013'),
(11, '2014'),
(11, '2015'),
(11, '2016'),
(12, '2012'),
(12, '2013'),
(12, '2014'),
(12, '2015'),
(12, '2016')],
names=['Col1', 'Col2'])
df2 = df2.set_index(['Col1','Col2']).reindex( newIndex, columns=['result'] )
print(df2)
result
Col1 Col2
10 2012 1.0
2013 NaN
2014 NaN
2015 NaN
2016 NaN
11 2012 1.0
2013 NaN
2014 NaN
2015 NaN
2016 1.0
12 2012 NaN
2013 1.0
2014 1.0
2015 1.0
2016 NaN