Search code examples
pythonpandasmulti-index

Inserting "missing" multiindex rows into a Pandas Dataframe


I have a pandas DataFrame with a two-level multiindex. The second level is numeric and supposed to be sorted and sequential for each unique value of the first-level index, but has gaps. How do I insert the "missing" rows? Sample input:

import pandas as pd
df = pd.DataFrame(list(range(5)),
                  index=pd.MultiIndex.from_tuples([('A',1), ('A',3),
                                                   ('B',2), ('B',3), ('B',6)]),
                  columns='value')
#     value
#A 1      0
#  3      1
#B 2      2
#  3      3
#  6      4

Expected output:

#     value
#A 1      0
#  2    NaN
#  3      1
#B 2      2
#  3      3
#  4    NaN
#  5    NaN
#  6      4

I suspect I could have used resample, but I am having trouble converting the numbers to anything date-like.


Solution

  • After much deliberations, I was able to come up with a solution myself. Judging by the fact of how lousy it is, the problem I am facing is not a very typical one.

    new_index = d.index.to_frame()\
                    .groupby(0)[1]\
                    .apply(lambda x:
                             pd.Series(1, index=range(x.min(), x.max() + 1))).index
    d.reindex(new_index)