Search code examples
javastringclasscoldfusionnormalization

Normalize String in ColdFusion


I'm trying to normalize a string in ColdFusion.

I want to use the Java class java.text.Normalizer for this, as CF doesn't have any similar functions as far as I know.

Here's my current code:

<cfset normalizer = createObject( "java", "java.text.Normalizer" ) />
<cfset string = "äéöè" />
<cfset string = normalizer.normalize(string, createObject( "java", "java.text.Normalizer$Form" ).NFD) />
<cfset string = ReReplace(string, "\\p{InCombiningDiacriticalMarks}+", "") />
<cfoutput>#string#</cfoutput>

Any ideas why it always outputs äéöè and not a normalized string?


Solution

  • In ColdFusion, unlike in Java, you don't need to escape backslashes in string literals. Your current regex will not match anything that does not start with a backslash, so no replacement happens.

    Other than that, your code is perfectly correct and you can see that the length of the string is 8, not 4, at the time of the output. This is an effect of the normalize call.

    However, remember that it is still an equivalent representation of the original string, and so it is not surprising that you cannot tell the difference visually. This is correct Unicode rendering in action.