Search code examples
pythonpandascsvpivotunpivot

How to load a csv file with uncommon structure in python to eventually pivot the data?


I'm having problems to load a simple .csv file with pd.read_csv(). The problem is the last header which groups multiple 'columns'.

The fifth columns looks like this:

Data_Series
0; 0,5; 1; 2; 4; 6
12; 15; 25; 33; 23
0; 0,3; 2; 3; 7; 8; 8,5; 9,5
22; 45; 35; 48; 55; 60; 62; 58

Expecting result should be a DataFrame with 5 columns
( with the Statistic column being pivoted [X, Y] to contain the Data_Series values ).

Any idea how to approach this? Thanks


Solution

  • import pandas as pd
    
    df_list = []
    
    with open('data_in.csv', 'r') as f:
        headers = False
        for line in f:
            if not headers:
                headers = line.strip().split('; ')
                continue
            
            # Machine; Process; Date; Statistic; Data_Series -> 5 columnss -> split 4 times
            unparsed = line.strip().split('; ', len(headers)-1)
            
            # split the last column
            parsed = [float(val.replace(',', '.')) for val in unparsed[-1].split('; ')]
            
            # add data to list after removeing any spaces on "Statistic"
            df_list.append(unparsed[:-2] + [unparsed[-2].strip()] + [parsed])
    
    df = pd.DataFrame(df_list, columns=headers)
    
    # pivot > explode > reset_index
    df_new = df.pivot(index=["Machine", "Process", "Date"], columns="Statistic", values="Data_Series").explode(list('XY')).reset_index()
    
    # remove columns names
    df_new.columns.name = None
    
    Machine Process Date X Y
    A P22 08.12.2015 0 12
    A P22 08.12.2015 0.5 15
    A P22 08.12.2015 1 25
    A P22 08.12.2015 2 33
    A P22 08.12.2015 4 23
    A P22 08.12.2015 6 45
    A PX67 09.12.2015 0 22
    A PX67 09.12.2015 0.3 45
    A PX67 09.12.2015 2 35
    A PX67 09.12.2015 3 48
    A PX67 09.12.2015 7 55
    A PX67 09.12.2015 8 60
    A PX67 09.12.2015 8.5 62
    A PX67 09.12.2015 9.5 58