Search code examples
javascriptunicodespidermonkey

SpiderMonkey and Unicode escapes: unexpected behavior


Does SpiderMonkey handle Unicode escapes properly? When I try to print a string with unicode escapes to standard out with SpiderMonkey, it munges them. V8 and Node.JS show output as expected.

Here's SpiderMonkey:

$ js
js> this.print("\u201cquotes\u201d")
quotes

This is worse than it looks, since the output contains binary data that isn't valid UTF-8.

Here's V8, which shows the quotes:

$ v8
V8 version 3.7.0 [sample shell]
> this.print("\u201cquotes\u201d")
“quotes”

Here's Node.JS, which also shows the quotes:

$ node
> console.log("\u201cquotes\u201d")
“quotes”

Edit: I'm running on Mac OS X 10.8.2 (Mountain Lion).

$ echo $LANG
en_US.UTF-8

$ js --version
JavaScript-C 1.8.5 2011-03-31

$ brew info spidermonkey
spidermonkey: stable 1.8.5, HEAD
https://developer.mozilla.org/en/SpiderMonkey
Depends on: readline, nspr
/usr/local/Cellar/spidermonkey/1.8.5 (101 files, 12M) *
https://github.com/mxcl/homebrew/commits/master/Library/Formula/spidermonkey.rb
==> Caveats
This formula installs Spidermonkey 1.8.5.
If you are trying to compile MongoDB from scratch, you will need 1.7.x instead.

Solution

  • Which revision are you running? Checking out mozilla-central on my machine it looks right.

    mozilla-central/js/dbg$ echo $LANG 
    en_US.UTF-8
    mozilla-central/js/dbg$ ./js -e 'this.print("\u201cquotes\u201d")'
    “quotes”
    mozilla-central/js/dbg$ hg log -l1 | head -n1
    changeset:   119130:8cc32d6fa707