Search code examples
javaxmlunicodestax

Java + unicode + HttpServletResponse = fail


I'm trying to make sure my data path -- a Tomcat servlet getting data into/out of MySQL database via JDBC -- handles Unicode directly.

I've been able to verify that I can read/write Unicode from the database. (When I debug Tomcat in Eclipse, I see the result retrieved from the database correctly.) But when I point my browser at my Tomcat servlet, a string like "García" (=Garci{U+0301}a) turns into "Garci?a" in the browser.

I'm using this code fragment to initialize the XML output (request and response are , which uses XMLStreamWriter, and I declare the result as UTF-8:

final protected HttpServletRequest request;
final protected HttpServletResponse response;
   ...

boolean handleRefreshMetadata()
{
    String s = request.getParameter("ids");
    Integer id = Integer.parseInt(s);
    boolean b = refreshMetadata(id); 
    response.setContentType("text/xml");
    try {
        PrintWriter writer = response.getWriter();
        XMLOutputFactory factory = XMLOutputFactory.newInstance();
        XMLStreamWriter xmlwriter = factory.createXMLStreamWriter(writer);      

        xmlwriter.writeStartDocument("UTF-8", "1.0");
        xmlwriter.writeStartElement("response");
        xmlwriter.writeAttribute("success", b ? "true" : "false");
        if (b && (id != null))
        {
            loadArticleFromID(getConnection(), xmlwriter, id);
        }
        xmlwriter.writeEndDocument();
        xmlwriter.flush();
        xmlwriter.close();
    } catch (IOException e) {
        e.printStackTrace();
    } catch (XMLStreamException e) {
        e.printStackTrace();
    }
    catch (SQLException e) {
        e.printStackTrace();
    }
    return b;
}

Am I missing something?


Solution

  • Darnit, I figured it out:

    instead of

    response.setContentType("text/xml");
    

    I need to do:

    response.setContentType("text/xml; charset=utf-8");