We're encountering a character encoding issue when reading a UTF-8 query string. An separate outside application is constructs links to our Orbeon application such as:
http://localhost:8080/ops/encoding-test/?message=hello%20world
http://localhost:8080/ops/encoding-test/?message=it%E2%80%99s%20a%20message
Our application's model reading the query string with the oxf:request processor, and then displaying the string in a view. In the first case above, the application displays "hello world" correctly without problems. In the second test case, %E2%80%99
is the URL encoding for a UTF-8 apostrophe, and causes the application to error with:
2012-09-13 12:21:43,383 ERROR XSLTTransformer - Error at line 174 of oxf:/config/theme-examples.xsl:
Illegal HTML character: decimal 128
2012-09-13 12:21:43,384 ERROR ProcessorService - Exception at line 174 of oxf:/config/theme-examples.xsl
; SystemID: oxf:/config/theme-examples.xsl; Line#: 174; Column#: -1
org.orbeon.saxon.trans.XPathException: Illegal HTML character: decimal 128
The error is referencing the %80
in the second byte of the multi-byte encoding of the apostrophe. Note that in the log not only does the theme raise an exception, but the xforms inspector does as well.
It appears like the URL is being decoded as Latin1 instead of UTF-8, as the debug processor lists it???s a message
with three characters for the apostrophe. In my research so far, it doesn't appear that HTTP has a way to specify the encoding of the query string itself.
Any guidance and assistance is appreciated!
(cross posted to ops-users mailing list at http://mail-archive.ow2.org/ops-users/2012-09/msg00033.html)
Orbeon Forms relies on what is returned by the servlet API: see getParameterMap()
in ServletExternalContext
. So this seems to be something you need to set at the application server level; if using Tomcat, you can do so by adding URIEncoding="UTF-8"
on the <Connector>
.