Search code examples
encodingcoldfusionescapingluceeesapi

Coldfusion/Lucee Encoding Issue When Using EncodeForHTML


Running into an issue when using EncodeForHTML for certain characters (Emojis in this case)

The text in this case is: ⌛️a😊b👍c😟 💥🍉🍔 💩 🤦🏼‍♀️🤦🏼‍♀️🤦🏼‍♀️ 😘

Now if I just a straight output

<cfoutput>#txt#</cfoutput>

It displays correctly, no issues, but if I use EncodeForHTML first

<cfoutput>#EncodeForHTML(txt)#</cfoutput>

I get this ⌛️a��b��c�� ������ �� ����‍♀️����‍♀️����‍♀️ ��

I tested it with EncodeForXML & esapiEncode as well to be sure; all are giving me the same result. I've verified the encoding settings in Lucee are UTF-8, and the meta charset tag is also set to UTF-8. I can't find any documenation re: EncodeForHTML saying if it make any changes to the character encoding, if it requires the character encoding to be something specific, or if it has any known issues with emojis or certain code points.

I appreciate any help or clarification anyone can provide.

Edit: Thank you everyone. Wish I could accept multiple answers.


Solution

  • I was required to sanitize emojis in order ensure that third-party content was cross-compatible with external services. Some of the content contained emojis and was causing export/import problems. I wrote a ColdFusion wrapper for the emoji-java library to identify, sanitize and convert emojis.

    https://github.com/JamoCA/cf-emoji-java

    For example, the parseToAliases() function "replaces all the emoji's unicodes found in a string by their aliases".

    emojijava = new emojijava();
    emojijava.parseToAliases('I like 🍕');   // I like :pizza:
    

    To "encode" you could use either the parseToHtmlDecimal() or parseToHtmlHexadecimal() functions prior to using EncodeForHTML().

    emojijava = new emojijava();
    test = emojijava.parseToHtmlDecimal('I like 🍕');   // I &#10084;️ &#127829;
    EncodeForHTML(test);