Search code examples
javaunicodecharacter-encodingweblogicjspx

Encoding errors in .jspx


I'm currently trying to deploy some RSS feeds on a WebLogic Application Server. The feeds' views are .jspx files, like the one below:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" 
    xmlns:georss="http://www.georss.org/georss"
    xmlns:jsp="http://java.sun.com/JSP/Page"
    xmlns:c="http://java.sun.com/jsp/jstl/core"
    xmlns:fmt="http://java.sun.com/jsp/jstl/fmt"
    xmlns:fn="http://java.sun.com/jsp/jstl/functions"
    xmlns:util="http://example.com/util">
    <jsp:directive.page pageEncoding="utf-8" contentType="application/xhtml+xml" /> 

    <jsp:useBean id="now" class="java.util.Date" scope="page" />

    [...]

    <c:forEach var="category" items="${categories}">
    <entry>
        <title>${util:htmlEscape(category.label)}</title>
        <id>${category.id}</id>
        <c:if test="${empty parentId}">
        <link href="${util:htmlEscape(fullRequest)}?parentId=${category.id}" />
        </c:if>
        <summary>${util:htmlEscape(category.localizedLabel)}</summary> 
    </entry>
    </c:forEach>
</feed>

The problem is that on my local development server (Apache Tomcat 6.0) everything renders fine, but on the WebLogic server I get all the UTF-8 characters back mangled.

In Firefox, I see something like <summary>Formaci�n</summary>. The byte sequence for the strange character is ef bf bd and I seem to get that for all UTF-8 chars that I'm supposed to receive in the tests I'm conducting (á, ó, í). I've checked the content-type and encoding in firebug and it seems ok (Content-Type: application/xhtml+xml; charset=UTF-8).

In Chrome, the content gets trucated at the first occurence of the strange character, with the error message: This page contains the following errors: error on line 1 at column 523: Encoding error.

I'm not sure what's happening, but I think it's related to something that the web server is doing, considering that on my local Tomcat everything's ok. Any ideas are welcome.

Thanks,
Alex


Solution

  • The issue was coming from the order of the attributes in the jspx directive and the fact that I wasn't including the charset in the contentType attribute!

    After switching:

    <jsp:directive.page pageEncoding="utf-8" contentType="application/xhtml+xml" />
    

    to:

    <jsp:directive.page contentType="application/xhtml+xml; charset=UTF-8" 
         pageEncoding="UTF-8" />
    

    The characters came out fine. I fiddled around a bit more, and, curiously, found out that this:

    <jsp:directive.page pageEncoding="UTF-8"
          contentType="application/xhtml+xml; charset=UTF-8" />
    

    doesn't work. I don't really understand why, but I'm guessing that it's a bug in WebLogic. The version I deployed on was 10.0.