I want to parse a flat file that looks like this in python;
Element ID Element Type Result Jacobian Sign
============== ================= ========= =====================
1 Parabolic Warning 1.000000
Hexahedron
2 Parabolic Warning 1.000000
Hexahedron
3 Parabolic Warning 1.000000
Hexahedron
4 Parabolic Warning 1.000000
I tried using the mechanism used in this answer as follows;
import pandas as pd
def parse_file(file):
col_spec = [(0, 15), (16, 33), (34, 43), (44, 65)]
return pd.read_fwf(file, colspecs=col_spec)
But it reads one record for the top row and one row which is empty apart from the word 'Hexahedron' as the element type.
>>> data = parse_file("example.txt")
>>> data.head()
Element ID Element Type Result Jacobian Sign
0 NaN NaN NaN NaN
1 ============== ================ ======== ====================
2 1 Parabolic Warning 1.000000
3 NaN Hexahedron NaN NaN <= Extra record
4 2 Parabolic Warning 1.000000
As you can see from lines, the first two rows are captured as 2 records (records 2 and 3). I want the parser to capture the first two rows as one record, so that the phrase 'Parabolic Hexahedron' is captured as the element type. How can I do this?
some post-processing should do the trick. Here is some code using the shift
operator. Also notice that opening the file is not required, just pass the filename to pd.read_fwf
.
import pandas as pd
col_spec = [(0, 15), (15, 32), (32, 42), (43, 65)]
df = pd.read_fwf("example.txt", colspecs=col_spec, comment="=")
# combine rows
df["combined"] = (df['Element Type'] + df['Element Type'].shift(-1)).where(df['Element ID'].notnull(), df['Element Type'] )
# remove extra rows
df = df[df['Element ID'].notnull()]
this should give a DataFrame that looks like this:
Element ID Element Type Result Jacobian Sign combined
2 1 Parabolic Warning 1.000000 ParabolicHexahedron
4 2 Parabolic Warning 1.000000 ParabolicHexahedron
6 3 Parabolic Warning 1.000000 ParabolicHexahedron
8 4 Parabolic Warning 1.000000 ParabolicHexahedron