I have quite a messy txt file which I need to convert to a dataframe to use as reference data. An Excerpt is shown below:
I've cleaned it up the best I can but to cut a long story short I would like to space delimit most of each line and then fixed delimit the last column. i.e. ignore the spaces in the last section.
Can anyone point me in the right direction of a resource which can do this? Not sure if Pandas copes with this?
Kenny
P.S. I have found some great resources to clean up the multiple whitespaces and replace the line breaks. Sorry can't find the original reference, so see attached.
fin = open("Input.txt", "rt")
fout = open("Ouput.txt", "wt")
for line in fin:
fout.write(re.sub(' +', ' ', line).strip() + "\n")
fin.close()
fout.close()
So what i would do is very simple, i would clean up the data as much as possible and then convert it to a csv file, because they are easy to use. An then i would step by step load it into a pandas dataframe and change if it needed.
with open("NudatClean.txt") as f:
text=f.readlines()
import csv
with open('dat.csv', 'w', newline='') as file:
writer = csv.writer(file)
for i in text:
l=i.split(' ')
row=[]
for a in l:
if a!='':
row.append(a)
print(row)
writer.writerow(row)
That should to the job for the beginning. But I don't know what you want remove exactly so I think the rest should be pretty clear.