Search code examples
pythonpandasmulti-index

How to reshape a multiindex dataframe using pandas stack/melt


*I'm editing this to hopefully make it clearer. I am trying to utilize combination of stack and melt to turn this sample dataframe into the below desired output.

Using pandas I load in the excel table below with the end goal of making it tidy and long.

 df=pd.read('myfile.xlsx)
  df1= df.stack(level=1)
   df2=df1.melt(col_level =1)

Sample dataframe

Good customer service Fresh ingredients Lots of seating
Papa johns Segment Avg Papa johns Segment Avg Papa Johns Segment Avg
12/1/2019 70 88 12/1/2019 2.2 5.5 12/1/2019 5.2 8.8
12/2/2019 50 78 12/2/2019 6.8 4.4 12/2/2019 5.3 7.8
12/3/2019 60 77 12/3/2019 8.9 2.3 12/3/2019 6.3 5.6
12/4/2019 30 76 12/4/2019 7.3 7.3 12/4/2019 7.9 4.6

Here is the end result I'm looking for

Date Restaurant Question Score
12/1/2019 PapaJohns Good customer service 70
12/2/2019 PapaJohns Good customer service 50
12/3/2019 PapaJohns Good customer service 60
12/4/2019 PapaJohns Good customer service 30
12/4/2019 PapaJohns Fresh ingredients 2.2
12/4/2019 PapaJohns Fresh ingredients 6.8
12/4/2019 PapaJohns Fresh ingredients 8.9
12/4/2019 PapaJohns Fresh ingredients 7.3

Solution

  • Assuming the following is how your excel file looks like:

    enter image description here

    Below are the steps how I would approach it using Pandas:

    1. Read the file but skip the first two rows and remove the empty columns
    2. Convert the DF into a multi-index DF
    3. Melt and concatenate the DF using a for loop
    # Step 1
    df = pd.read_excel("raw_data.xlsx", skiprows=2, header=None)
    df.dropna(axis=1, inplace=True)
    
    
    # Step 2(a): Create the multi-index column name
    col_name = np.array(['date', 'Papa Johns', 'Segment Avg'])
    category = ['Good Customer Service', 'Fresh Ingredients', 'Lots of Seating']
    outer = [cat for cat in category for _ in range(3)]
    inner = [col for _ in range(3) for col in col_name]
    
    
    # Step 2(b): Change the DF into Multi-index
    df.columns = [outer, inner]
    print(df)
    
     Good Customer Service                        Fresh Ingredients             \
                       date Papa Johns Segment Avg              date Papa Johns   
    0            2019-01-12         70          88        2019-01-12        2.2   
    1            2019-02-12         50          78        2019-02-12        6.8   
    2            2019-03-12         60          77        2019-03-12        8.9   
    3            2019-04-12         30          76        2019-04-12        7.3   
    
                  Lots of Seating                         
      Segment Avg            date Papa Johns Segment Avg  
    0         5.5      2019-01-12        5.2         8.8  
    1         4.4      2019-02-12        5.3         7.8  
    2         2.3      2019-03-12        6.3         5.6  
    3         7.3      2019-04-12        7.9         4.6
    
    
    # Step 3
    final_df = pd.DataFrame(columns=['date', 'question', 'restaurant', 'score']) #Empty DF
    
    for cat in category:
        temp_df = df.melt(id_vars=[(cat, 'date')],
                          value_vars=[(cat, 'Papa Johns')])
        temp_df.columns = ['date', 'question', 'restaurant', 'score']
        final_df = pd.concat([final_df, temp_df])
        
    print(final_df)
    
            date               question  restaurant score
    0 2019-01-12  Good Customer Service  Papa Johns    70
    1 2019-02-12  Good Customer Service  Papa Johns    50
    2 2019-03-12  Good Customer Service  Papa Johns    60
    3 2019-04-12  Good Customer Service  Papa Johns    30
    0 2019-01-12      Fresh Ingredients  Papa Johns   2.2
    1 2019-02-12      Fresh Ingredients  Papa Johns   6.8
    2 2019-03-12      Fresh Ingredients  Papa Johns   8.9
    3 2019-04-12      Fresh Ingredients  Papa Johns   7.3
    0 2019-01-12        Lots of Seating  Papa Johns   5.2
    1 2019-02-12        Lots of Seating  Papa Johns   5.3
    2 2019-03-12        Lots of Seating  Papa Johns   6.3
    3 2019-04-12        Lots of Seating  Papa Johns   7.9