I've got now a the most wonderful task, the dream of all programmers. There is a roughly 15 year old software here, and I only have to fix "some bugs" in it. 32 bit java6, tomcat6, non-unicode source code, ant build system, and everything what I can only "like".
Note, I have power only over a .war file, thus server-side settings aren't okay.
Your main problem lies likely in the <bean:message>
tag, although also other tags may be problematic.
The Java core supports utf8 since its very early alpha days, but unfortunately there is an exception in the handling of .properties
files. These files are interpreted always an iso8859-1 by the JDK API calls.
The Struts1 taglibs use i18n strings addressed by keys, stored in *.properties
files. Digging a little bit into the struts1 source, I've found these:
.properties
files with the JDK calls, thus always in iso8859-1. It is deeply hardwired into the code, there is no way to change it.system.properties
or web.xml
settings, the .properties
will be still read always as iso8859-1. This locale/localekey only adds an extra extension to the actually interpreted properties file.Although the struts and the other parts of your system (for example, the JSP parser/interpreter) does already some conversion as needed, so this iso8859-1 text will be converted to utf8, if your JSP pages are correctly set up (meta headers and so on).
Furthermore, the property reader uses a - similarly hardwired, undisable - feature, to have a little support for utf8. It accepts utf8 characters in the form \uC0DE
. Thus, after a \u
or \U
(case insensitive), you can give a 16-bit hexa value, which can be and unicode character.
It has to be always 16 bit long, other lengths are not allowed, but these are already case insensitive.
Thus,
my.property.key=árvíztűrő tükörfúrógép
...encoded as utf8, won't work, it will be interpreted as iso8859-1.
You can enter this string as iso8859-1. It can't work, because some of the accents don't have an iso8859-1 mapping, i.e. they don't exist in the iso8859-1 encoding.
However, if you encode it into the above described format:
my.property.key=\u00E1rv\u00EDzt\u0171r\u0151 t\u00FCk\u00F6rf\u00FAr\u00F3g\u00E9p
then yes, it will work!
To do this conversion, the Sun had a native2ascii
tool, which is unreachable today. You have to dig this tool from some archive on the net, or find a different one.
On Linux, there is a tool named uni2ascii
(on debian-based distributions, you can install it with apt-get install uni2ascii
), which does the correct conversion. The correct parameters are:
uni2ascii -a U myfile.properties
The result goes to the stdout.
It is up to you, how do you integrate it into your build system (some ant/maven exec module, or simply use it on change every time, manually).