Search code examples
windowsunixformatasciizsh

Why error "illegal character: ^M"?


I got some sort of formatting -problem that I cannot understand, a short example below using Zsh. Why?

$ values=( 300 400 )
$ echo "scale=20; $values[1]-$values[2]" | bc
(standard_in) 1: illegal character: ^M         // Why does it not print -100?
$ echo $values                                 // no ^M sign found!
300 400

Helper questions

  1. Why is 5E 4d 0a, ie ^M, 13th char in ASCII?
  2. Why is the ending sign "0a" shown as a dot "."? The "." is "2E" in Hex.

Solution

  • Unix and Windows have different line ending formats. In the Unix world, lines end with the linefeed character (LF, ascii char 10). Windows ends lines with a carriage return (CR, ascii char 13) followed by a linefeed.

    Files with Windows line endings must be converted to Unix format before they can work with Unix tools. Otherwise programs like bc see the CR characters as junk and complain, as in your case.

    To convert files to Unix format, you can use dos2unix(1) if you have it installed, or alternately pass it through sed 's/^M//g' (but don't type a literal ^M - press Ctrl+V, followed by Ctrl+M).

    So why ^M? Well, the carriage return is a nonprintable character. It has no printable representation. For convenience, your terminal will display it as ^M*. So why didn't it appear when you did echo $values? Unfortunately, the command line argument processing strips it out, so you don't see it.

    Also for your convenience, your terminal allows you to type nonprintable characters via Ctrl+V and Ctrl + some letter. Ctrl+V and Ctrl+M will produce a ^M character, but move your cursor left and right and you'll see it skips over the whole thing as a single character - not the same as typing ^ followed by M. While you see ^M, command line programs see only the raw data, and will see an actual carriage return character.

    Why is 5E 4d 0a, ie ^M, 13th char in ASCII?

    You ran hexdump on the output of echo "^M", which produces three characters: a ^, a M, and a linefeed character (LF). See above, that's not the same as a carriage return!

    Why is the ending sign 0a shown as .? . is 2E in DEC. Hex number 5E is 94 in DEC, 4d is 77 in DEC.

    Hexdump displays all non-printable characters as . characters, including carriage return and line feed characters.


    *Why M in particular? The convention is to add 64 to the ASCII code. A carriage return is ASCII code 13 (0x0D). Add 64 and you get 77 (0x4D) which is an uppercase M. See this page for a full listing.