Search code examples
javascriptregexunicode

Does JavaScript support Unicode ranges above 0xFFFF in regular expressions?


This example regular expression (^[\u0021-\u003F\u0041-\uFFEF]+@[\u0021-\u003F\u0041-\uFFEF]+\.[\u0021-\u003F\u0041-\uFFEF]+$) can filter characters by their Unicode character ranges, and I can make ranges go from \u0000 to \uFFFF, but Unicode supports characters beyond 0xFFFF. Can I use ranges above these in JavaScript's regular expressions?


Solution

  • For backwards compatibility with ECMAScript 5 and older environments, the unfortunate solution is to use surrogate pairs:

    >> '\uD83D\uDCA9'  
    '💩' // U+1F4A9 PILE OF POO  
    

    In that case, each escape represents the code point of a surrogate half. Two surrogate halves form a single astral symbol.

    Link: https://mathiasbynens.be/notes/javascript-unicode