I have a dataframe df with duplicated indices:
a
o_idx
411 x
412 y
412 x
412 z
450 a
450 b
I would like to assign multindex to the duplicated rows like:
a
o_idx index
411 0 x
412 0 y
412 1 x
412 2 z
450 0 a
450 1 b
where the new index is a 0 if it is not a duplicate. If it is a duplicate the new index should be a range of the length of how many times it is duplicated.
I have tried making two dataframes first by trying to get rid of duplicates:
a = df[~df.index.duplicated(keep=False)]
a = a.reset_index()
a.index = np.zeros(len(a.index), dtype = int)
a = a.reset_index()
a = a.set_index(['index', 'o_idx'])
a.index.names = ['index', 'o_idx']
then getting the duplicates:
b = df[df.index.duplicated(keep=False)]
b = b.reset_index()
b.index = range(len(b.index))
b = b.reset_index()
b = b.set_index(['index', 'o_idx'])
b.index.names = ['index', 'o_idx']
Then to concatenate:
c = pd.concat([a,b])
However this doesn't give me the correct outcome because b is equal to:
a
o_idx index
412 0 y
412 1 x
412 2 z
450 3 a
450 4 b
So I pretty much have the exact same problem as I did in the beginning. Is there a better way to approach this?
Here is one way using cumcount
then set_index
back
yourdf=df.assign(index=df.groupby(level=0).cumcount()).set_index('index',append=True)
yourdf
Out[568]:
a
o_idx index
411 0 x
412 0 y
1 x
2 z
450 0 a
1 b