Search code examples
pythonpandasdataframemulti-indexreindex

Create multiindex from existing dataframe


I've spent hours browsing everywhere now to try to create a multiindex from dataframe in pandas. This is the dataframe I have (posting excel sheet mockup. I do have this in pandas dataframe):

have

And this is what I want:

want

I have tried

newmulti = currentDataFrame.set_index(['user_id','account_num'])

But it returns a dataframe, not a multiindex. Also, I could not figure out how to make 'user_id' level 0 and 'account_num' level 1. I think this must be trivial but I've read so many posts, tutorials, etc. and still could not figure it out. Partly because I'm a very visual person and most posts are not. Please help!


Solution

  • You could simply use groupby in this case, which will create the multi-index automatically when it sums the sales along the requested columns.

    df.groupby(['user_id', 'account_num', 'dates']).sales.sum().to_frame()
    

    You should also be able to simply do this:

    df.set_index(['user_id', 'account_num', 'dates'])
    

    Although you probably want to avoid any duplicates (e.g. two or more rows with identical user_id, account_num and date values but different sales figures) by summing them, which is why I recommended using groupby.

    If you need the multi-index, you can simply access viat new_df.index where new_df is the new dataframe created from either of the two operations above.

    And user_id will be level 0 and account_num will be level 1.