Search code examples
pandasindexinggroup-bymulti-index

Ongoing-number as first multiindex in pandas


In a grouped (by id) data frame df

id, last_name, first_name
sdfg, Muller, Peter
fgjh, Smith, Rob
fgjh, Smith, Robert
hjsa, Jackson, Michael

I want to add another index person at the very beginning as an ongoing number, so that we'll get

person, id, last_name, first_name
0, sdfg, Muller, Peter
1, fgjh, Smith, Rob
1, fgjh, Smith, Robert
2, hjsa, Jackson, Michael

I tried to use reset_index(...), but this sets an ongoing number regardless of the grouping.

Is it possible at all?


Solution

  • Here's one way to do this:

    df.set_index((~df.index.duplicated()).cumsum()-1, append=True).swaplevel()
    
    #         last_name  first_name
    #   id                         
    # 0 sdfg     Muller       Peter
    # 1 fgjh      Smith         Rob
    #   fgjh      Smith      Robert
    # 2 hjsa    Jackson     Michael
    

    This approach works by negating the result of df.index.duplicated and taking the cumulative sum (minus 1) to generate an array that increments each time the value in the index changes. It then adds it to your existing index with set_index and append=True, before finally using swaplevel so that the id portion of your new MultiIndex is at level 1.