Search code examples
unixawksedcarriage-returnlinefeed

Remove CRLF from the middle of a file


I have been receiving a text file where each row should be 246 columns in length. For some reason an errant CRLF is being inserted in the file after every 23,036 characters, causing all sorts of problems.

The file is in a windows format, all line endings are CRLF.

Is there some way to strip out these extra CR-LF characters from this file, without disturbing the CRLF that exists at the end of every other line? Unix tools would be the preferred method here, if possible (awk, sed, etc).

Below is a sample of how the block of text looks like when there is an extra CRLF character added. Please note, this file is 258 Meg in size, and that extra CRLF occurs along the line in different places further down the file.

enter image description here


Solution

  • When your not sure what position, you can delete all line endings and add them at the right places:

    (tr -d "\r\n" < my_inputfile | fold -w 245;echo) | sed 's/$/\r/'
    

    The echo is needed, since fold will not add a newline for the last line.