We have data from a device that measures multiple parts, and it outputs multiple measurements for each part into a CSV file. We read the CSV file into a dataframe with a structure such as this:
PartNo 12
Meas1 45
Meas2 23
!END
PartNo 13
Meas1 63
Meas2 73
!END
PartNo 12
Meas1 82
Meas2 84
!END
The "!END" flag indicates where the data from one part ends, and the next part starts. We would like to reshape the data so it looks like:
PartNo Meas1 Meas2
12 45 23
13 63 73
12 82 84
(Note that a part could appear more than once - so there is no field that is guaranteed to be unique across all records.)
A pivot produces:
0 !END Meas1 Meas2 PartNo
0 NaN NaN NaN 12.0
1 NaN 45.0 NaN NaN
2 NaN NaN 23.0 NaN
3 NaN NaN NaN NaN
4 NaN NaN NaN 13.0
5 NaN 63.0 NaN NaN
6 NaN NaN 73.0 NaN
7 NaN NaN NaN NaN
8 NaN NaN NaN 12.0
9 NaN 82.0 NaN NaN
10 NaN NaN 84.0 NaN
11 NaN NaN NaN NaN
How do I squeeze these rows down to group by PartNo?
A transpose produces:
0 1 2 3 4 5 6 7 8 9 10 11
0 PartNo Meas1 Meas2 !END PartNo Meas1 Meas2 !END PartNo Meas1 Meas2 !END
1 12 45 23 NaN 13 63 73 NaN 12 82 84 NaN
How could I reset the row every 4th item?
I could create a new index column in the original dataframe, and then iterate through the rows incrementing the index for every row with !END (and then use the index to group the data), but it seems that there ought to be a more elegant shape shifting function to handle this case, or maybe there is an argument to Pivot or Transpose that would handle this. I am a Python beginner. Here is the full code:
import pandas as pd
from io import StringIO
tdata = (
'PartNo, 12\n'
'Meas1, 45\n'
'Meas2, 23\n'
'!END\n'
'PartNo, 13\n'
'Meas1, 63\n'
'Meas2, 73\n'
'!END\n'
'PartNo, 12\n'
'Meas1, 82\n'
'Meas2, 84\n'
'!END\n')
tdf = pd.read_csv(StringIO(tdata), header=None)
print(tdf)
print(tdf.pivot(index=None, columns=0, values=1))
print(tdf.T)
#having dataframe x:
>>> x = pd.DataFrame([['PartNo',12],['Meas1',45],['Meas2',23],['!END',''],['PartNo',13],['Meas1',63],['Meas2',73],['!END',''],['PartNo',12],['Meas1',82],['Meas2',84],['!END','']])
>>> x
0 1
0 PartNo 12
1 Meas1 45
2 Meas2 23
3 !END
4 PartNo 13
5 Meas1 63
6 Meas2 73
7 !END
8 PartNo 12
9 Meas1 82
10 Meas2 84
11 !END
#grouping by first column, and aggregating values to list. First column then contains Series that you want. By converting each list in this series to series, dataframe is created, then you just need to transpose
>>> df = x.groupby(0).agg(lambda x: list(x))[1].apply(lambda x: pd.Series(x)).transpose()
>>> df[['PartNo','Meas1','Meas2']]
0 PartNo Meas1 Meas2
0 12 45 23
1 13 63 73
2 12 82 84