Search code examples
javajsputf-8character-encodingtomcat7

Tomcat's JspWriter not correctly encoding


I have a default setup of Tomcat 7 and everything java-related configured to use utf-8.

This does not work (utf-8 characters are mangled):

<%@ page language="java" pageEncoding="utf-8" contentType="text/html; charset=utf-8"%>
<%@ page import="java.net.*" %>
<%@ page import="java.io.*" %>
<%
    URL target = new URL("http://en.wikipedia.org/wiki/Main_Page");
    Reader input = new BufferedReader(new InputStreamReader(target.openStream()));
    StringWriter buffer = new StringWriter();
    char[] chrs = new char[1024 * 4];
    int n = 0;
    while (-1 != (n = input.read(chrs)))
    {
        buffer.write(chrs, 0, n);
    }
    StringReader reader = new StringReader(buffer.toString());
    n = 0;
    while (-1 != (n = reader.read(chrs)))
    {
        out.write(chrs, 0, n);
    } 
%>

This does, but logs IllegalStateExceptions:

<%@ page language="java" pageEncoding="utf-8" contentType="text/html; charset=utf-8"%>
<%@ page import="java.net.*" %>
<%@ page import="java.io.*" %>
<%
    URL target = new URL("http://en.wikipedia.org/wiki/Main_Page");
    Reader input = new BufferedReader(new InputStreamReader(target.openStream()));
    StringWriter buffer = new StringWriter();
    char[] chrs = new char[1024 * 4];
    int n = 0;
    while (-1 != (n = input.read(chrs)))
    {
        buffer.write(chrs, 0, n);
    }
    StringReader reader = new StringReader(buffer.toString());
    OutputStreamWriter output = new OutputStreamWriter(response.getOutputStream());
    n = 0;
    while (-1 != (n = reader.read(chrs)))
    {
        output.write(chrs, 0, n);
    }
%>

I've been searching but found no answers. Is this a bug in Tomcat, or is there something I'm missing?


Solution

  • When you construct InputStreamReader without specifying a charset as 2nd argument, then the platform default encoding will be used, which is often ISO-8859-1. You need to specify the same charset as specified in the response header of the target URL, which is UTF-8.

    input = new BufferedReader(new InputStreamReader(target.openStream(), "UTF-8"));
    

    The IllegalStateException is caused because you're doing this in a JSP instead of a Servlet. The JSP internally uses response.getWriter(), but you're calling response.getOutputStream() in a JSP scriptlet. This cannot be done simultaneously as explained in their javadocs.