I have been working on extracting data from a large number of files. I want to form a table of the data, with the file base name as the left most column and the numerical data in the next. So far, I have been testing on a folder containing 8 files, but am hoping to be able to read hundreds.
I have tried adding an index, but that seemed to cause more problems. I am attaching the closest working code I have come up with, alongside the output.
In:
import re, glob
import pandas as pd
pattern = re.compile('-\d+\D\d+\skcal/mol', flags=re.S)
for file in glob.glob('*rank_*.pdb'):
with open(file) as fp:
for result in pattern.findall(fp.read()):
Dock_energy = {file:[],result:[]}
df = pd.DataFrame(Dock_energy)
df.append(df)
df = df.append(df)
print(df)
This seems to work for extracting the data, but it is not in the form I am looking for.
Out:
Empty DataFrame
Columns: [-10.02 kcal/mol, MII_rank_8.pdb]
Index: []
Empty DataFrame
Columns: [-12.51 kcal/mol, MII_rank_5.pdb]
Index: []
Empty DataFrame
Columns: [-13.47 kcal/mol, MII_rank_4.pdb]
Index: []
Empty DataFrame
Columns: [-14.67 kcal/mol, MII_rank_2.pdb]
Index: []
Empty DataFrame
Columns: [-13.67 kcal/mol, MII_rank_3.pdb]
Index: []
Empty DataFrame
Columns: [-14.80 kcal/mol, MII_rank_1.pdb]
Index: []
Empty DataFrame
Columns: [-11.45 kcal/mol, MII_rank_7.pdb]
Index: []
Empty DataFrame
Columns: [-12.47 kcal/mol, MII_rank_6.pdb]
Index: []
What is causing the fractured table, and why are my columns in reverse order from what I am hoping? Any help is greatly appreciate.
This should be closer to what you intend:
all_data = []
for file in glob.glob('*rank_*.pdb'):
with open(file) as fp:
file_data = []
for result in pattern.findall(fp.read()):
file_data.append([file, result])
all_data.extend(file_data)
df = pd.DataFrame(all_data, columns=['file', 'result'])
print(df)