Search code examples
pythonpandasdataframe

How to transform multi-columns to rows


I have a excel file with multi-columns as below (Sorry but I don't know how to recreate it with pandas): enter image description here

Below is my expected Output:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Code': ['11000000000', '11200100000', '11710000000', '11000000000', '11200100000', '11710000000', '11000000000', '11200100000', '11710000000'], 
                   'Code Name': ['Car', 'Motorbike', 'Bike', 'Car', 'Motorbike', 'Bike', 'Car', 'Motorbike', 'Bike'],
                   'Date': ['19-02-2024', '19-02-2024', '19-02-2024', '19-02-2024', '19-02-2024', '19-02-2024', '19-02-2024', '19-02-2024', '19-02-2024'],
                   'Customer': ['Customer A', 'Customer A', 'Customer A', 'Customer B', 'Customer B', 'Customer B', 'Customer ...', 'Customer ...', 'Customer ...'], 
                   'Point_1': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan], 
                   'Point_2': [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]})
df

    Code        Code Name   Date        Customer      Point_1   Point_2
0   11000000000 Car         19-02-2024  Customer A    NaN       NaN
1   11200100000 Motorbike   19-02-2024  Customer A    NaN       NaN
2   11710000000 Bike        19-02-2024  Customer A    NaN       NaN
3   11000000000 Car         19-02-2024  Customer B    NaN       NaN
4   11200100000 Motorbike   19-02-2024  Customer B    NaN       NaN
5   11710000000 Bike        19-02-2024  Customer B    NaN       NaN
6   11000000000 Car         19-02-2024  Customer ...  NaN       NaN
7   11200100000 Motorbike   19-02-2024  Customer ...  NaN       NaN
8   11710000000 Bike        19-02-2024  Customer ...  NaN       NaN

What should I do to get this result. Thank you


Solution

  • Create DataFrame with MultiIndex first in index and columns by parameters index_col and header and then use DataFrame.stack with first level and dropna parameter for avoid remove rows with missing values:

    df = pd.read_excel(file, index_col=[0,1,2], header=[0,1])
    
    #test MultiIndex in columns
    print (df.columns)
    MultiIndex([('Customer A', 'Point_1'),
                ('Customer A', 'Point_2'),
                ('Customer B', 'Point_1'),
                ('Customer B', 'Point_2'),
                ('Customer ...', 'Point_1'),
                ('Customer ...', 'Point_2')],
               names=['Customer', None])
    
    #test MultiIndex in index
    print (df.index)
    MultiIndex([('11000000000',       'Car', '19-02-2024'),
                ('11200100000', 'Motorbike', '19-02-2024'),
                ('11710000000',      'Bike', '19-02-2024')],
               names=['Code', 'Code Name', 'Date'])
    

    EDIT: There is problem with missing values in headers, so is possible use alternative solution - create MultiIndex first in columns and use DataFrame.set_index with DataFrame.rename_axis:

    df = pd.read_excel('file.xls', header=[0,1])
    
    df = df.set_index(df.columns[:3].tolist()).rename_axis(df.columns[:3].get_level_values(0))
    print (df)
                                     Customer A         Customer B          \
                                        Point_1 Point_2    Point_1 Point_2   
    Code        Code Name Date                                               
    11000000000 Car       19-02-2024        NaN     NaN        NaN     NaN   
    11200100000 Motorbike 19-02-2024        NaN     NaN        NaN     NaN   
    11710000000 Bike      19-02-2024        NaN     NaN        NaN     NaN   
    
                                     Customer …          
                                        Point_1 Point_2  
    Code        Code Name Date                           
    11000000000 Car       19-02-2024        NaN     NaN  
    11200100000 Motorbike 19-02-2024        NaN     NaN  
    11710000000 Bike      19-02-2024        NaN     NaN  
    

    out = df.stack(0, dropna=False).reset_index()
    print (out)
              Code  Code Name        Date      Customer  Point_1  Point_2
    0  11000000000        Car  19-02-2024  Customer ...      NaN      NaN
    1  11000000000        Car  19-02-2024    Customer A      NaN      NaN
    2  11000000000        Car  19-02-2024    Customer B      NaN      NaN
    3  11200100000  Motorbike  19-02-2024  Customer ...      NaN      NaN
    4  11200100000  Motorbike  19-02-2024    Customer A      NaN      NaN
    5  11200100000  Motorbike  19-02-2024    Customer B      NaN      NaN
    6  11710000000       Bike  19-02-2024  Customer ...      NaN      NaN
    7  11710000000       Bike  19-02-2024    Customer A      NaN      NaN
    8  11710000000       Bike  19-02-2024    Customer B      NaN      NaN