Search code examples
sedline-endings

Why doesn't sed 's/\r\n/\r/g' work as expected?


In normal windows-to-unix conversion, you can do something like sed s/\r//g, which removes the \r characters from the stream.

But I'm trying to convert endlines of files that could be mac encoded (\r) or windows encoded (\r\n). So I cannot just remove the \r, as it would delete the mac endings if there's any. I have to "canonicalize" the line-ending characters first. This canonicalization step converts from \r\n to \r (after which I do the \r to \n conversion). Yet, I'm not able to solve this step with sed. I tried something like this:

$> echo -e "foo\r\nbar" | sed 's/\r\n/\r/g' | xxd -c 24 -g 1
00000000: 66 6f 6f 0d 0a 62 61 72 0a            foo..bar.

I was able to solve it with bbe like this:

$> echo -e "foo\r\nbar" | bbe -e 's/\r\n/\r/g' | xxd -c 24 -g 1
00000000: 66 6f 6f 0d 62 61 72 0a               foo.bar.

Is it possible to do the same with sed?


Solution

  • sed by default splits input on \n, so \n never ends up in the pattern space. However, if you are using GNU sed, you can use -z/--null-data option to make sed treat the input as NUL character separated lines:

    $ echo -e "foo\r\nbar" | sed -z 's/\r\n/\r/g' | hd
    00000000  66 6f 6f 0d 62 61 72 0a                           |foo.bar.|
    

    Alternatively, in POSIX sed, you can append all lines to the pattern space (with N command in a loop), effectively copying the complete file to the pattern space, and then do the substitute:

    $ echo -e "foo\r\nbar" | sed -n ':a;N;ta; s/\r\n/\r/g; p' | hd
    00000000  66 6f 6f 0d 62 61 72 0a                           |foo.bar.|