I have a pandas DataFrame with a two-level multiindex. The second level is numeric and supposed to be sorted and sequential for each unique value of the first-level index, but has gaps. How do I insert the "missing" rows? Sample input:
import pandas as pd
df = pd.DataFrame(list(range(5)),
index=pd.MultiIndex.from_tuples([('A',1), ('A',3),
('B',2), ('B',3), ('B',6)]),
columns='value')
# value
#A 1 0
# 3 1
#B 2 2
# 3 3
# 6 4
Expected output:
# value
#A 1 0
# 2 NaN
# 3 1
#B 2 2
# 3 3
# 4 NaN
# 5 NaN
# 6 4
I suspect I could have used resample
, but I am having trouble converting the numbers to anything date-like.
After much deliberations, I was able to come up with a solution myself. Judging by the fact of how lousy it is, the problem I am facing is not a very typical one.
new_index = d.index.to_frame()\
.groupby(0)[1]\
.apply(lambda x:
pd.Series(1, index=range(x.min(), x.max() + 1))).index
d.reindex(new_index)