I have a list of values and need to remove errant carriage returns whenever they occur in a list of values.
the format of the file that I am looking to remove these in is as follows.
field1|field2|field3|field4|field5
value 1|value 2|value 3|value 4|value 5
value 1|value 2|value 3|value 4|value 5
value 1|value 2|val
ue 3|value 4|value 5
value 1|value 2|value 3|va
lue 4|value 5
I am looking to address a situation like the one above where there are errant carriage returns in the 3rd and 4th values for the last 2 rows of data.
I have seen a few posts for how to address this but so far nothing has worked for this situation. I have pasted the code I have attempted so far.
import os
import sys
filetoread = 'C:\temp\test.dat'
filetowrite = 'C:\emp\test_updated.dat'
'''
Attempt 1
'''
with open(filetoread, "r+b") as inf:
with open(filetowrite, "w") as fixed:
for line in inf:
fixed.write(line)
'''
Attempt 2
'''
for line in filetoread:
line = line.replace("\n", "")
'''
Attempt 3
'''
with open(filetoread, "r") as inf:
for line in inf:
if "\n" in line:
line = line.replace("\n", "")
You have to count the number of fields, to match 5 per line:
import re
with open(filetoread, "r+b") as inf:
with open(filetowrite, "w") as fixed:
for l in re.finditer('(?:.*?\|){4}(?:.*?)\n', inf.read(), re.DOTALL):
fixed.write(l.group(0).replace('\n','') + '\n')