So I have this simple code:
public class FooBar {
public static void main(String[] args) {
String foo = "ğ";
System.out.println(foo.getBytes().length);
}
}
And let me compile it and run it:
$ javac FooBar.java
$ java -Dfile.encoding=UTF-32 FooBar
4
Ok, I am not surprised that a character took 4 byes in a String, because I told Java to use UTF-32 encoding when running the program.
Lets try running the program with UTF-8 Encoding:
$ java -Dfile.encoding=UTF-8 FooBar
2
All seems fine.
Now currently the class file (FooBar.class) is 451 bytes. I will change the code like this:
public class FooBar {
public static void main(String[] args) {
String foo = "ğğ";
System.out.println(foo.getBytes().length);
}
}
compile it again, and see the length of the file in my disk to be: 453 bytes.
Obviously, the file itself is stored in the disk with UTF-8 encoding. If I run this .class file now with UTF-32 encoding:
$ java -Dfile.encoding=UTF-32 FooBar
8
Well all seems fine but, is there anyway to tell the compiler to encode the .class file using UTF-32 for String characters?
The system property file.encoding
determines the default charset but is not used by the compiler.
Java class files have a defined binary data structure which cannot be changed (except you write your own compiler and classloader).
Therefore the encoding of strings in the constant pool is always modified UTF-8.