Search code examples
htmlutf-8html-inputmaxlength

Maxlength of HTML input with UTF8 supplementary characters


I would like to enable my users an option to enter EMoji characters in an input field. I assume that in 2019 this should be as trivial as setting the meta charset of the website to UTF-8. However when tested in Chrome or Firefox the below example counts supplementary UTF-8 characters (with length 4 bytes) differently.
In the first input I can only enter 2 more characters after the poop. In the second input I can still enter 3 more characters after which is 3 bytes long.

What is causing this inconsistent behaviour? Is there another HTML meta setting for 4 byte characters? It worked fine in Edge 17. Even trash IE 11 counts the length correctly.

<input type="text" value="💩" maxlength="4" />
<input type="text" value="‰" maxlength="4" />

My Test cases: http://jsfiddle.net/L726ryea/7/


Solution

  • The HTML5 spec says that maxlength applies to the JavaScript string length which is the number of UTF-16 code units. So codepoints beyond 0xFFFF like Emojis count as two code units. This explains the behavior you're seeing.