In normal windows-to-unix conversion, you can do something like sed s/\r//g
, which removes the \r characters from the stream.
But I'm trying to convert endlines of files that could be mac encoded (\r) or windows encoded (\r\n). So I cannot just remove the \r, as it would delete the mac endings if there's any. I have to "canonicalize" the line-ending characters first. This canonicalization step converts from \r\n to \r (after which I do the \r to \n conversion). Yet, I'm not able to solve this step with sed
. I tried something like this:
$> echo -e "foo\r\nbar" | sed 's/\r\n/\r/g' | xxd -c 24 -g 1
00000000: 66 6f 6f 0d 0a 62 61 72 0a foo..bar.
I was able to solve it with bbe like this:
$> echo -e "foo\r\nbar" | bbe -e 's/\r\n/\r/g' | xxd -c 24 -g 1
00000000: 66 6f 6f 0d 62 61 72 0a foo.bar.
Is it possible to do the same with sed?
sed
by default splits input on \n
, so \n
never ends up in the pattern space. However, if you are using GNU sed
, you can use -z
/--null-data
option to make sed
treat the input as NUL
character separated lines:
$ echo -e "foo\r\nbar" | sed -z 's/\r\n/\r/g' | hd
00000000 66 6f 6f 0d 62 61 72 0a |foo.bar.|
Alternatively, in POSIX sed
, you can append all lines to the pattern space (with N
command in a loop), effectively copying the complete file to the pattern space, and then do the substitute:
$ echo -e "foo\r\nbar" | sed -n ':a;N;ta; s/\r\n/\r/g; p' | hd
00000000 66 6f 6f 0d 62 61 72 0a |foo.bar.|