I have a default setup of Tomcat 7 and everything java-related configured to use utf-8.
This does not work (utf-8 characters are mangled):
<%@ page language="java" pageEncoding="utf-8" contentType="text/html; charset=utf-8"%>
<%@ page import="java.net.*" %>
<%@ page import="java.io.*" %>
<%
URL target = new URL("http://en.wikipedia.org/wiki/Main_Page");
Reader input = new BufferedReader(new InputStreamReader(target.openStream()));
StringWriter buffer = new StringWriter();
char[] chrs = new char[1024 * 4];
int n = 0;
while (-1 != (n = input.read(chrs)))
{
buffer.write(chrs, 0, n);
}
StringReader reader = new StringReader(buffer.toString());
n = 0;
while (-1 != (n = reader.read(chrs)))
{
out.write(chrs, 0, n);
}
%>
This does, but logs IllegalStateExceptions:
<%@ page language="java" pageEncoding="utf-8" contentType="text/html; charset=utf-8"%>
<%@ page import="java.net.*" %>
<%@ page import="java.io.*" %>
<%
URL target = new URL("http://en.wikipedia.org/wiki/Main_Page");
Reader input = new BufferedReader(new InputStreamReader(target.openStream()));
StringWriter buffer = new StringWriter();
char[] chrs = new char[1024 * 4];
int n = 0;
while (-1 != (n = input.read(chrs)))
{
buffer.write(chrs, 0, n);
}
StringReader reader = new StringReader(buffer.toString());
OutputStreamWriter output = new OutputStreamWriter(response.getOutputStream());
n = 0;
while (-1 != (n = reader.read(chrs)))
{
output.write(chrs, 0, n);
}
%>
I've been searching but found no answers. Is this a bug in Tomcat, or is there something I'm missing?
When you construct InputStreamReader
without specifying a charset as 2nd argument, then the platform default encoding will be used, which is often ISO-8859-1. You need to specify the same charset as specified in the response header of the target URL, which is UTF-8.
input = new BufferedReader(new InputStreamReader(target.openStream(), "UTF-8"));
The IllegalStateException
is caused because you're doing this in a JSP instead of a Servlet. The JSP internally uses response.getWriter()
, but you're calling response.getOutputStream()
in a JSP scriptlet. This cannot be done simultaneously as explained in their javadocs.