Search code examples
python-3.xpandasdataframeanalysis

How to set multindex for duplicated indices


I have a dataframe df with duplicated indices:

        a
o_idx
411     x
412     y
412     x
412     z
450     a
450     b

I would like to assign multindex to the duplicated rows like:

                    a
o_idx    index
411        0        x
412        0        y
412        1        x
412        2        z
450        0        a
450        1        b

where the new index is a 0 if it is not a duplicate. If it is a duplicate the new index should be a range of the length of how many times it is duplicated.

I have tried making two dataframes first by trying to get rid of duplicates:

a = df[~df.index.duplicated(keep=False)]
a = a.reset_index()
a.index = np.zeros(len(a.index), dtype = int)
a = a.reset_index()
a = a.set_index(['index', 'o_idx'])
a.index.names = ['index', 'o_idx']

then getting the duplicates:

b = df[df.index.duplicated(keep=False)]
b = b.reset_index()
b.index = range(len(b.index))
b = b.reset_index()
b = b.set_index(['index', 'o_idx'])
b.index.names = ['index', 'o_idx']

Then to concatenate:

c = pd.concat([a,b])

However this doesn't give me the correct outcome because b is equal to:

                    a
o_idx    index
412        0        y
412        1        x
412        2        z
450        3        a
450        4        b

So I pretty much have the exact same problem as I did in the beginning. Is there a better way to approach this?


Solution

  • Here is one way using cumcount then set_index back

    yourdf=df.assign(index=df.groupby(level=0).cumcount()).set_index('index',append=True)
    yourdf
    Out[568]: 
                 a
    o_idx index   
    411   0      x
    412   0      y
          1      x
          2      z
    450   0      a
          1      b