I wrote a nice script that parses into a pandas dataframe some monolitic output for easy plotting and manipulation.
I can read different sections of the lines and turn them into a bigger dataframe almost always
I can combine the following two:
E*/MeV 224Pu 225Pu 226Pu 227Pu 228Pu 229Pu 230Pu
48.0 0.192E-11 0.106E-05 0.498E-05 0.354E-06 0.000E+00 0.000E+00 0.000E+00
...
...
and
E*/MeV 224Pu 225Pu 226Pu 227Pu 228Pu 229Pu 230Pu
48.0 0.192E-11 0.106E-05 0.498E-05 0.354E-06 0.000E+00 0.000E+00 0.000E+00
...
...
by making them a dataframe with some crafty one liner
pd.read_csv(filename, encoding='latin1', sep='\s+', skip_blank_lines=True, skiprows=lambda x: x not in rows_to_keep)
once they are dataframes I can concatenate them.
HOWEVER, my problem arises when the headers suddenly also have one nasty space like bellow (notice the differce between 224Pu
and 222 U
). This last sections completely screws up my read_cvs.
E*/MeV 222 U 223 U 224 U 225 U 226 U 227 U 228 U
48.0 0.722E-06 0.102E-03 0.306E-04 0.188E-06 0.671E-06 0.976E-06 0.977E-08
...
...
Is there any way where I can apply different "sep" settings for header and data?
Thank you in advance.
I tried different versions of sep, including sep=\s{2,}
and it reads correctly the headers but now obviously it misses the data.
I do not see how adding more separators would help me as they are both spaces, or maybe I don't know enough about regexp.
You can handle headers and data separately:
import re
filename = 'data.csv'
with open(filename) as fp:
while True:
row = fp.readline().strip()
if row:
headers = re.split(r'\s{2,}', row)
break
df = pd.read_csv(fp, sep='\s+', header=None, names=headers, ...)
First case:
>>> df
E*/MeV 224Pu 225Pu 226Pu 227Pu 228Pu 229Pu 230Pu
0 48.0 1.920000e-12 0.000001 0.000005 3.540000e-07 0.0 0.0 0.0
Second case:
>>> df
E*/MeV 222 U 223 U 224 U 225 U 226 U 227 U 228 U
0 48.0 7.220000e-07 0.000102 0.000031 1.880000e-07 6.710000e-07 9.760000e-07 9.770000e-09