Search code examples
pythonlistcsvcarriage-return

Replace carriage returns in list python


I have a list of values and need to remove errant carriage returns whenever they occur in a list of values.

the format of the file that I am looking to remove these in is as follows.

field1|field2|field3|field4|field5
value 1|value 2|value 3|value 4|value 5
value 1|value 2|value 3|value 4|value 5
value 1|value 2|val
ue 3|value 4|value 5
value 1|value 2|value 3|va
lue 4|value 5

I am looking to address a situation like the one above where there are errant carriage returns in the 3rd and 4th values for the last 2 rows of data.

I have seen a few posts for how to address this but so far nothing has worked for this situation. I have pasted the code I have attempted so far.

import os
import sys

filetoread = 'C:\temp\test.dat'
filetowrite = 'C:\emp\test_updated.dat'

'''
Attempt 1
'''
with open(filetoread, "r+b") as inf:
    with open(filetowrite, "w") as fixed:
        for line in inf:
            fixed.write(line)


'''
Attempt 2
'''           
for line in filetoread:
    line = line.replace("\n", "")


'''
Attempt 3
'''
with open(filetoread, "r") as inf:
    for line in inf:
        if "\n" in line:
            line = line.replace("\n", "")

Solution

  • You have to count the number of fields, to match 5 per line:

    import re
    with open(filetoread, "r+b") as inf:
        with open(filetowrite, "w") as fixed:
            for l in re.finditer('(?:.*?\|){4}(?:.*?)\n', inf.read(), re.DOTALL):
                fixed.write(l.group(0).replace('\n','') + '\n')