Search code examples
pythonpandasmulti-index

Advanced Pandas: Create Multi-Index panda dataframe from certain row indices


I have a data set like this:

    m   n   o  
0   2   22  42  
1   3   23  43  
2   4   24  44  
3   5   25  45  
4   6   26  46  
5   7   27  47  
6   8   28  48  
7   9   29  49  
8   10  30  50  
9   11  31  51  

How can we convert this into multi-index dataframe:

Index   m    n  o  
  A  
     0  2   22  42  
     1  3   23  43  
     2  4   24  44  
  B  
    4   6   26  46  
  C  
    6   8   28  48  
    7   9   29  49  
    8   10  30  50 

** My attempt**

import numpy as np 
import pandas as pd

df = pd.DataFrame({'m': np.arange(2,12),
                   'n': np.arange(22,32),
                  'o': np.arange(42,52)})

df

** The groupby method**

# Required index and their names
idx = [3,5,9]  # A is 0,1,2 B is 4 and C is 6,7,8
idx_orig = idx.copy()
idx_names = ['A','B','C']

# Attempt
idx_diff = np.diff(idx)
idx_diff = np.hstack((idx[0]+1,idx_diff)) # Add the first index value
idx_diff = idx_diff - 1 # Decrease index number
idx_names = np.repeat(idx_names,idx_diff)

# Drop rows with given indices
df = df.drop(df.index[idx_orig])

# Assign new col
df['Names'] = idx_names
#df.groupby('Names').count()
df

The output

    m   n   o   Names
0   2   22  42  A
1   3   23  43  A
2   4   24  44  A
4   6   26  46  B
6   8   28  48  C
7   9   29  49  C
8   10  30  50  C

Here, I wanted to get a multi-index dataframe with all the rows, however, groupby gives only the counts.


Solution

  • Use:

    idx = [3,5,9] 
    
    idx_names = ['A','B','C']
    d = dict(enumerate(idx_names))
    
    #get boolean mask for create cumulative sum and filter out idx rows
    mask = df.index.isin(idx)
    df['g'] = mask.cumsum()
    #map by dictioanry
    df['g'] = df['g'].map(d)
    #create MultiIndex and change order of levels
    df = df[~mask].set_index('g', append=True).swaplevel(0,1)
    print (df)
          m   n   o
    g              
    A 0   2  22  42
      1   3  23  43
      2   4  24  44
    B 4   6  26  46
    C 6   8  28  48
      7   9  29  49
      8  10  30  50