Search code examples
pandasdataframepivot-tablepandas-melt

Melt on pandas dataframes with multi-level


I have a data table that looks like this:

Original Table

enter image description here

Importing this into pandas using df = pd.read_excel('H:/test/test.xlsx', header=[0,1,2]) gives this dataframe:

    Day                 Cats                   Dogs              
    Unnamed: 0_level_1  Small       Big        Little     Large    
    Unnamed: 0_level_2  young  old  young old  young old  young old
0   1                   1      2    1     5    3     2    8     6
1   2                   2      4    3     6    0     1    4     6
2   3                   3      6    6     7    9     3    5     0

I want to convert the dataframe into something that looks like this:

    Day Count  Animal   Size    Age
0   1   1      Cats     Small   young
1   2   2      Cats     Small   young
2   3   3      Cats     Small   young
3   1   2      Cats     Small   old
4   2   4      Cats     Small   old
5   3   6      Cats     Small   old
6   1   1      Cats     Big     young
7   2   3      Cats     Big     young
8   3   6      Cats     Big     young
9   1   5      Cats     Big     old
10  2   6      Cats     Big     old
11  3   7      Cats     Big     old
12  1   3      Dogs     Little  young
13  2   0      Dogs     Little  young
14  3   9      Dogs     Little  young
15  1   2      Dogs     Little  old
16  2   1      Dogs     Little  old
17  3   3      Dogs     Little  old
18  1   8      Dogs     Large   young
19  2   4      Dogs     Large   young
20  3   5      Dogs     Large   young
21  1   6      Dogs     Large   old
22  2   6      Dogs     Large   old
23  3   0      Dogs     Large   old

I tried this method to achieve the result:

df.columns = ['_'.join(col).strip() for col in df.columns.values]
df.rename(columns={'Day_Unnamed: 0_level_1_Unnamed: 0_level_2':'Day'}, inplace=True)
df = df.melt(id_vars=['Day'], var_name='INFO', value_name='Count')
df['INFO'] = df['INFO'].str.split('_')
df['Animal'] = df['INFO'].str[0]
df['Size'] = df['INFO'].str[1]
df['Age'] = df['INFO'].str[2]
df.drop(columns='INFO',inplace=True)

What I've done is I've flattened my multi-level headers into a plain dataframe, and used meltto unpivot the dataframe. Then I separated my df[INFO] column into the specific columns 'Animal','Size','Age'.

This method works, but I want to know if there are any ways to achieve this without having to flatten the table into a plain format and splitting the 'INFO' column because the dataset I want to work with have far more columns than just three.


Solution

  • Use melt:

    out = (
      df.set_index(df.columns[0])
        .melt(var_name=['Animal', 'Size', 'Age'], value_name='Count', ignore_index=False)
        .rename_axis('Day')
        .reset_index()
    )
    

    Output:

    >>> out
        Day Animal    Size    Age  Count
    0     1   Cats   Small  young      1
    1     2   Cats   Small  young      2
    2     3   Cats   Small  young      3
    3     1   Cats   Small    old      2
    4     2   Cats   Small    old      4
    5     3   Cats   Small    old      6
    6     1   Cats     Big  young      1
    7     2   Cats     Big  young      3
    8     3   Cats     Big  young      6
    9     1   Cats     Big    old      5
    10    2   Cats     Big    old      6
    11    3   Cats     Big    old      7
    12    1   Dogs  Little  young      3
    13    2   Dogs  Little  young      0
    14    3   Dogs  Little  young      9
    15    1   Dogs  Little    old      2
    16    2   Dogs  Little    old      1
    17    3   Dogs  Little    old      3
    18    1   Dogs   Large  young      8
    19    2   Dogs   Large  young      4
    20    3   Dogs   Large  young      5
    21    1   Dogs   Large    old      6
    22    2   Dogs   Large    old      6
    23    3   Dogs   Large    old      0