Search code examples
javaservlet-4

Why 'ServletContext#setRequestCharacterEncoding' does not have an effect on 'HttpServletRequest#getReader'?


We can set the default character encoding to use for reading request bodies by ServletContext#setRequestCharacterEncoding (since Servlet 4.0).

I think that the character encoding for HttpServletRequest#getReader can be set using ServletContext#setRequestCharacterEncoding(*).

But the reader that HttpServletRequest#getReader returns seems to decode characters not using the encoding set by ServletContext#setRequestCharacterEncoding.

My questions are:

  • Why ServletContext#setRequestCharacterEncoding does not have an effect on HttpServletRequest#getReader(but it have an effect on HttpServletRequest#getParameter)?
  • Is there any specification describing such ServletContext#setRequestCharacterEncoding and HttpServletRequest#getReader behaviors?

(I read Servlet Specification Version 4.0, but I can't find any spec about such behaviors.)

I have created a simple war application and tested ServletContext#setRequestCharacterEncoding.

[Env]

  • Tomcat9.0.19 (I don't change any default configuration)
  • JDK11
  • Windows8.1

[index.html]

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
</head>
<body>
    <form action="/SimpleWarApp/app/simple" method="post">
        <!-- The value is Japanese character '\u3042' -->
        <input type="text" name="hello" value="あ"/>
        <input type="submit" value="submit!"/>
    </form>
    <button type="button" id="the_button">post</button>
    <script>
        document.getElementById('the_button').addEventListener('click', function() {
            var xhttp = new XMLHttpRequest();
            xhttp.open('POST', '/SimpleWarApp/app/simple');
            xhttp.setRequestHeader('Content-Type', 'text/plain');
            <!-- The body content is Japanese character '\u3042' -->
            xhttp.send('あ');
        });
    </script>
</body>
</html>

[InitServletContextListener.java]

@WebListener
public class InitServletContextListener implements ServletContextListener {
    @Override
    public void contextInitialized(ServletContextEvent sce) {
        sce.getServletContext().setRequestCharacterEncoding("UTF-8");
    }
}

[SimpleServlet.java]

@WebServlet("/app/simple")
@SuppressWarnings("serial")
public class SimpleServlet extends HttpServlet {

    @Override
    protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
        // req.setCharacterEncoding("UTF-8");
        System.out.println("requestCharacterEncoding : " + req.getServletContext().getRequestCharacterEncoding());
        System.out.println("req.getCharacterEncoding() : " + req.getCharacterEncoding());

        String hello = req.getParameter("hello");
        if (hello != null) {
            System.out.println("hello : " + req.getParameter("hello"));
        } else {
            System.out.println("body : " + req.getReader().readLine());
        }
    }
}

I don't have any servlet filters. The above three are all the components of this war application. (GitHub)

Case 1: When I submit the form with a parameter 'hello', the value of 'hello' is successfully decoded as follows.

requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
hello : あ

Case 2: When I click 'post' and send text content, the request body cannot be successfully decoded as follows. (Although I confirm that the request body is encoded by UTF-8 like this: E3 81 82)

requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : ???

Case 3: When I also set the encoding using HttpServletRequest#setCharacterEncoding at the first line of the servlet's 'doPost' method instead, the request body successfully decoded.

requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : あ

Case 4: When I use http.setRequestHeader('Content-Type', 'text/plain; charset=UTF-8'); javascript, the request body successfully decoded.

requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : あ

Case 5: When I do not call req.getParameter("hello"), the request body cannot be successfully decoded.

requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : ???

Case 6: When I do not call ServletContext#setRequestCharacterEncoding at InitServletContextListener.java, no character encoding is set.

requestCharacterEncoding : null
req.getCharacterEncoding() : null
body : ???

[NOTE]

  • (*)I think so because:

    • (1) The java doc of HttpServletRequest#getReader says

      "The reader translates the character data according to the character encoding used on the body".

    • (2) The java doc of HttpServletRequest#getCharacterEncoding says

      "Returns the name of the character encoding used in the body of this request".

    • (3) The java doc of HttpServletRequest#getCharacterEncoding also says

      "The following methods for specifying the request character encoding are consulted, in decreasing order of priority: per request, per web app (using ServletContext.setRequestCharacterEncoding, deployment descriptor)".

  • ServletContext#setResponseCharacterEncoding works fine. When I use ServletContext#setResponseCharacterEncoding, The writer that HttpServletResponse#getWriter returns encodes the response body by the character encoding set by it.


Solution

  • It is an Apache Tomcat bug (specific to getReader()) that will be fixed in 9.0.21 onwards thanks to your report on the Tomcat users mailing list.

    For the curious, here is the fix.