Search code examples
javaspringspring-mvcutf-8character-encoding

Spring MVC and UTF-8: How to work with Swedish special characters?


I try to find the word with special Swedish characters "bäck" in my database, I have a jsp-page:

<%@ page pageEncoding="utf-8" contentType="text/html; charset=utf-8" %>
    ...
<form name="mainform" action="/web/admin/users/">
    <input id="keywords" type="text" name="keywords" size="30"
           value="${status.value}" tabindex="1" />
    <button class="link" type="submit">Search</button>
</form>

a filter:

public class RequestResponseCharacterEncodingFilter extends OncePerRequestFilter {

    private String encoding;

    private boolean forceEncoding;

    protected void doFilterInternal(
            HttpServletRequest request, HttpServletResponse response, FilterChain filterChain)
            throws ServletException, IOException {
       request.setCharacterEncoding(this.encoding);
       response.setCharacterEncoding(this.encoding);
       filterChain.doFilter(request, response);
    }
}

web.xml

<web-app ...>
...
    <filter>
        <filter-name>encodingFilter</filter-name>
        <filter-class>test.testdomain.spring.RequestResponseCharacterEncodingFilter</filter-class>
        <init-param>
            <param-name>encoding</param-name>
            <param-value>UTF-8</param-value>
        </init-param>
        <init-param>
            <param-name>forceEncoding</param-name>
            <param-value>true</param-value>
        </init-param>
    </filter>
    <filter-mapping>
        <filter-name>encodingFilter</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>
...
</web-app>

When I start finding the "bäck" word, It appears like this bäck. A request is encoded into UTF-8:IE request capture

but right before I exit my doFilterInternal method in my filter in debugger I see:IDEA debugger

What I am doing wrong? Why is the text not encoded into UTF-8?

EDIT: It is very strange, I've just tried to query in Chrome and Mozilla Firefox and there it works well, so it appears to me that I have this problem only in Internet Explorer

EDIT: Internet Explorer gives me this string: b%C3%A4ck but Mozilla Firefox and Chrome give me the string: b%E4ck. They are obviously different why is that?


Solution

  • Your screenshots indicate that your search keyword, bäck, is sent as part of the URL, as a URL parameter. It also indicates that this work seems correctly UTF-8 URL encoded. And the String you get back in your debugger is typical of ISO-Latin decoding of UTF-8 encoded bytes : e.g. the HTTPServletRequest parser used ISO-Latin parsing for a UTF-8 encoded string.

    So, your ServletFilter is of no help in interpreting it :

    request.setCharacterEncoding(this.encoding);
    response.setCharacterEncoding(this.encoding);
    

    Because as the javadoc says : these methods work on the body of HTTP request, not on its URLs.

    /**
     * Overrides the name of the character encoding used in the body of this
     * request. This method must be called prior to reading request parameters
     * or reading input using getReader(). Otherwise, it has no effect.
     * 
    

    Seeing URL parameter parsing is a responsability of your Servlet container, the setting you should look at probably is a container level one. For example, on Tomcat, as stated in the documentation at : http://tomcat.apache.org/tomcat-7.0-doc/config/http.html :

    URIEncoding : This specifies the character encoding used to decode the URI bytes, after %xx decoding the URL. If not specified, ISO-8859-1 will be used.

    By default, it uses ISO-8859-1. You should change that to UTF-8, and then, your request parameters will be correctly parsed from your servlet container, and passed to the HTTPServletRequest object.

    EDIT : As you are seeing inconsistent browser behaviour, you may look into the consistency of your HTML form. Please make sure that

    1. Your HTTP Content-Type header AND your HTML "meta" tag defining the charset are both present and coherent in declaring a charset. (Given your servlet filter, they both should be UTF-8)
    2. You actually respect that charset declaration in the body of your response (you actually write UTF-8 strings from your JSP - or whatever else)