Search code examples
servletsutf-8character-encodingiso-8859-1

Servlet receiving data both in ISO-8859-1 and UTF-8. How to URL-decode?


I've a web application (well, in fact is just a servlet) which receives data from 3 different sources:

  • Source A is a HTML document written in UTF-8, and sends the data via <form method="get">.
  • Source B is written in ISO-8859-1, and sends the data via <form method="get">, too.
  • Source C is written in ISO-8859-1, and sends the data via <a href="http://my-servlet-url?param=value&param2=value2&etc">.

The servlet receives the request params and URL-decodes them using UTF-8. As you can expect, A works without problems, while B and C fail (you can't URL-decode in UTF-8 something that's encoded in ISO-8859-1...).

I can make slight modifications to B and C, but I am not allowed to change them from ISO-8859-1 to UTF-8, which would solve all the problems.

In B, I've been able to solve the problem by adding accept-charset="UTF-8" to the <form>. So it sends the data in UTF-8 even with the page being ISO.

What can I do to fix C?

Alternatively, is there any way to determine the charset on the servlet, so I can call URL-decode with the right encoding in each case?


Edit: I've just found this, which seems to solve my problem. I still have to make some tests in order to determine if it impacts the perfomance, but I think I'll stick with that solution.


Solution

  • I'm answering myself in order to mark the question as solved:

    I found this question, which covers exactly the same problem I was facing. The javax.servlet.Filter was the solution for me.