I have a huge CSV file (196244 line) where it has \n in place other than new lines, I want to remove those \n but keep \r\n intact.
I've tried line.replace
but seems like it is not recognizing \r\n
so next I tried regex
with open(filetoread, "r") as inf:
with open(filetowrite, "w") as fixed:
for line in inf:
line = re.sub("(?<!\r)\n", " ", line)
fixed.write(line)
but it is not keeping \r\n
it is removing everything. I can't do it in Notepad++ it is crashing on this file.
You are not exposing the line breaks to the regex engine. Also, the line breaks are "normalized" to LF when using open
with r
mode, and to keep them all in the input, you can read the file in in the binary mode using b
. Then, you need to remember to also use the b
prefix with the regex pattern and replacement.
You can use
with open(filetoread, "rb") as inf:
with open(filetowrite, "wb") as fixed:
fixed.write(re.sub(b"(?<!\r)\n", b" ", inf.read()))
Now, the whole file will be read into a single string (with inf.read()
) and the line breaks will be matched, and eventually replaced.
Pay attention to
"rb"
when reading file in"wb"
to write file outre.sub(b"(?<!\r)\n", b" ", inf.read())
contains b
prefixes with string literals, and inf.read()
reads in the file contents into single variable.