I have been receiving a text file where each row should be 246 columns in length. For some reason an errant CRLF is being inserted in the file after every 23,036 characters, causing all sorts of problems.
The file is in a windows format, all line endings are CRLF.
Is there some way to strip out these extra CR-LF characters from this file, without disturbing the CRLF that exists at the end of every other line? Unix tools would be the preferred method here, if possible (awk, sed, etc).
Below is a sample of how the block of text looks like when there is an extra CRLF character added. Please note, this file is 258 Meg in size, and that extra CRLF occurs along the line in different places further down the file.
When your not sure what position, you can delete all line endings and add them at the right places:
(tr -d "\r\n" < my_inputfile | fold -w 245;echo) | sed 's/$/\r/'
The echo
is needed, since fold
will not add a newline for the last line.