Search code examples
coldfusionurlencode

How to prevent unicode character corruption when using getPageContext().getRequest().getParameterValues()?


We have a scenario where a page submits multiple fields with the same name. To workaround the default approach of CF to put these into a comma-delimited string, without changing application-wide, we access field values in certain places as an array using getPageContext().getRequest().getParameterValues("#fieldname#").

The problem we are experiencing is that unicode characters submitted are being corrupted. For example El celular que compré está averiado in a field array comes back as the string El celular que compré está averiado. If I dump getHTTPRequestData() I can see the properly url encoded El+celular+que+compr%C3%A9+est%C3%A1+averiado is sent to the server.

Is the java string not being handled by CF correctly? Anyway to resolve this issue on a non-application-wide basis other than parsing getHTTPRequestData().content which we really don't want to do?


Solution

  • The reason will be because your webserver is not using utf-8 internally for its encoding of parameters. You don't get to see this normally when accessing variables by the url scope, because CF has already converted them for you, however you can see this difference when looking at cgi.query_string or at getPageContext().getRequest().getParameterValues(...)

    In your case it looks like you're seeing windows-1252 encoding. I had a similar issue around IIS7.5 - IIS8. Assuming you can't or don't want to risk trying to change your webserver configuration, this workaround should work for you:

    webserverEncodedString = getPageContext().getRequest().getParameterValues(fieldname);
    binaryValue = CharsetDecode(webserverEncodedString, "windows-1252");
    utf8EncodedString = CharsetEncode(binaryValue, "utf-8");