Search code examples
pythonpandasdataframemulti-index

Multi indexing in Pandas


I have the below Pandas dataframe. It has two levels of indexing in the columns Segment Name and Variables.

                                   mean
seg1  daily_time_spend_on_sight      25
      age                            36
      area_income                  1250
      clicked_on_ad                 250
seg2  daily_time_spend_on_sight      10
      age                            26
      area_income                   950
      clicked_on_ad                 125

I need to change the level 0 index to appear in all the records pertaining to it:

                                   mean
seg1  daily_time_spend_on_sight      25
seg1  age                            36
seg1  area_income                  1250
seg1  clicked_on_ad                 250
seg2  daily_time_spend_on_sight      10
seg2  age                            26
seg2  area_income                   950
seg2  clicked_on_ad                 125

Solution

  • If you have a dataset like below with multi index you can use .reset_index()

    arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
    tuples = list(zip(*arrays))
    index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
    s = pd.Series(np.random.randn(8), index=index)
    print(s)
    
    first  second
    bar    one      -0.632949
           two      -1.418744
    baz    one      -1.318791
           two       0.194042
    foo    one      -0.139960
           two       0.971686
    qux    one      -0.257964
           two       1.911748
    dtype: float64
    
    

    s.reset_index() will give

    first second         0
    0   bar    one -0.632949
    1   bar    two -1.418744
    2   baz    one -1.318791
    3   baz    two  0.194042
    4   foo    one -0.139960
    5   foo    two  0.971686
    6   qux    one -0.257964
    7   qux    two  1.911748
    

    You can also use pd.option_context('display.multi_sparse', False) i.e

    with pd.option_context('display.multi_sparse', False):
        print(s)
    

    Output :

    first  second
    bar    one       1.157404
    bar    two      -0.000333
    baz    one      -0.774613
    baz    two      -1.962658
    foo    one       1.337555
    foo    two       0.856814
    qux    one       0.506146
    qux    two       0.755346
    dtype: float64
    

    For more info about multi indexing you can visit here

    Hope it helps