i have a file containing large data in the following format :
[metadata]
metadata1 metadata2 metadata3
[data]
label1 label2 label3
0 0 0
0 0 0
0 0 0
...
I am trying to load the data section to process in my python script as a pandas dataframe.
What i've tried so far is loading the file in excel and saving it as a sheet then loading in into a dataframe using pandas.read_excel()
but this is not a practical solution as you can imagine.
I would appreciate help on this subject and thanks !
I would approach it following way: open file for reading, consume lines until [data]
is found, give it to pandas.read_csv
specyfing whitespaces as separator. Let file.txt
content
[metadata]
metadata1 metadata2 metadata3
[data]
label1 label2 label3
0 0 0
0 0 0
0 0 0
then
import pandas as pd
f = open("file.txt", "r")
while(next(f).strip()!='[data]'):
pass
df = pd.read_csv(f, sep='\s+')
f.close()
print(df)
output
label1 label2 label3
0 0 0 0
1 0 0 0
2 0 0 0
Explanation: tail of file might be considered as csv file with whitespaces as separator, by consuming lines up to [data]
I am seeking to point so what follows is legal csv, then use pandas.read_csv
function. Note that if number of lines before line with labels is known you might just exploit pandas.read_csv
's skiprows
.