Search code examples
unicodezsh

Difference in behaviour for Unicode characters in shell


  • OS: MacOS
  • Shell: zsh
  • Locale: LC_ALL and LANG set to en_US.UTF-8
  • 💡 (Unicode Code Point: U+1F4A1, UTF-8 Encoding: 0xF0 0x9F 0x92 0xA1)

I noticed some inconsistent behaviour when printing an emoji from a Go program, i.e.,

Suppose I am printing 💡 from my program to standard output (stdout), it gets printed fine. Also, pasting the same on my terminal works perfectly because locale is configured correctly — UTF8.

Now, suppose if I have set LC_ALL and LANG to en_US.US-ASCII, then pasting the emoji on my terminal results in:

�<009f><0092>�

but the emoji printed from the Go program to standard output is still working fine.

But why? I expected the output of the program to be gibberish as well because of the difference in encodings set.


Solution

  • Setting LC_ALL and LANG informs programs how to show things, i.e., what you expect them to send to your terminal emulator, and what you expect to send them from your terminal emulator. It does not change how Terminal or iTerm2 or whatever terminal emulator you may be using actually displays things, nor how it actually sends things.

    That is, by setting LC_ALL and LANG to en_US.US-ASCII, you've lied to several programs about how your terminal emulator works. Go ignored the lie, while zsh believed it and started behaving weirdly. (Go ignored these deliberately since there's basically nothing else to do anyway. It's not clear precisely why zsh believed the lie and started changing behavior here, although there's probably some sort of history with meta keys at play.)