I'm having problems to load a simple .csv file with pd.read_csv()
.
The problem is the last header which groups multiple 'columns'.
The fifth columns looks like this:
Data_Series
0; 0,5; 1; 2; 4; 6
12; 15; 25; 33; 23
0; 0,3; 2; 3; 7; 8; 8,5; 9,5
22; 45; 35; 48; 55; 60; 62; 58
Expecting result should be a DataFrame with 5 columns
( with the Statistic column being pivoted [X, Y] to contain the Data_Series values ).
Any idea how to approach this? Thanks
import pandas as pd
df_list = []
with open('data_in.csv', 'r') as f:
headers = False
for line in f:
if not headers:
headers = line.strip().split('; ')
continue
# Machine; Process; Date; Statistic; Data_Series -> 5 columnss -> split 4 times
unparsed = line.strip().split('; ', len(headers)-1)
# split the last column
parsed = [float(val.replace(',', '.')) for val in unparsed[-1].split('; ')]
# add data to list after removeing any spaces on "Statistic"
df_list.append(unparsed[:-2] + [unparsed[-2].strip()] + [parsed])
df = pd.DataFrame(df_list, columns=headers)
# pivot > explode > reset_index
df_new = df.pivot(index=["Machine", "Process", "Date"], columns="Statistic", values="Data_Series").explode(list('XY')).reset_index()
# remove columns names
df_new.columns.name = None
Machine | Process | Date | X | Y |
---|---|---|---|---|
A | P22 | 08.12.2015 | 0 | 12 |
A | P22 | 08.12.2015 | 0.5 | 15 |
A | P22 | 08.12.2015 | 1 | 25 |
A | P22 | 08.12.2015 | 2 | 33 |
A | P22 | 08.12.2015 | 4 | 23 |
A | P22 | 08.12.2015 | 6 | 45 |
A | PX67 | 09.12.2015 | 0 | 22 |
A | PX67 | 09.12.2015 | 0.3 | 45 |
A | PX67 | 09.12.2015 | 2 | 35 |
A | PX67 | 09.12.2015 | 3 | 48 |
A | PX67 | 09.12.2015 | 7 | 55 |
A | PX67 | 09.12.2015 | 8 | 60 |
A | PX67 | 09.12.2015 | 8.5 | 62 |
A | PX67 | 09.12.2015 | 9.5 | 58 |