Search code examples
regexperlnewline

Handling fixed-length records with embedded unix newlines


I am receiving a text files that is fixed length fields and carriage return/newline delimited records (CRLF). Recently one of the text fields started to present with a newline character in the record (LF). This is obviously causing some problems on our unix server.

I would like to simply look for the use of LF in the file and replace it with a single space, but this will obviously interfere with the windows newlines.

I have tried tr and perl but can't quite seem to get it right:

cat badinput.txt | perl -p -e 's/\x0D\x0A/\x0D/' | perl -p -e 's/\0A/ /' | perl -p -e 's/\x0D/\x0D\x0A/' > goodoutput.txt

The idea is to

  • replace CRLF with CR
  • replace LF with
  • replace CR with CRLF

For some reason I'm not quite getting the CR -> CRLF transformation.

Suggestions?


Solution

  • You can read the whole input with -0777 and then do the substitution:

    perl -0777pe 's/\r\n/\r/g;s/\n/ /g;s/\r/\r\n/g' badinput.txt
    

    The parameter are:

    • p which outputs the value of $_ at the end of each "line"
    • 0777 which sets the record delimiter to undef

    Perl Command-line Options