As a test I created a file called Hello.java and the contents are as follows:
public class Hello{
public static void main(String[] args){
System.out.println("Hello world!");
}
}
I saved this file with UTF-8 encoding.
Anyway, compiling and running the problem was no problem. This file was 103 bytes long.
I then saved the file with UTF-16 BE encoding. This time the file was 206 bytes long, since well UTF-16 (usually) needs more space, so no surprise here.
Tried compiling the file from my terminal and I got all these errors:
Hello.java:4: error: illegal character: '\u0000'
}
^
So does javac work only with UTF-8 encoded source files? Is that like a standard?
javac -version
javac 1.8.0_45
Also, I only know Java but lets say you are running Python code or any interpreted programming language. (Sorry if I am mistaken by thinking Python is interpreted if it is not..) Would the encoding be a problem? If not, would it have any effect on performance?
Ok so the word "true" is a reserved keyword (for a given programming language..) but in what encoding is it reserved? ASCII - UTF-8 only?
How "true" is stored in the hard drive or in memory depends on the encoding the file is saved in, so must a programming language expect always to work with a particular encoding for source files?
Regarding javac, you can set the encoding with -encoding
parameter. Internally Java handles strings in UTF-16 so the compiler will convert everything to that.
The compiler must know the encoding so it can process the source codes. It doesn't matter what compiler, interpreter or language it is. Just like people can't just take random language text and assume it's German.
Keywords aren't reserves in any specific encoding. They are keywords. You can't have two ways of writing a single word no matter what encoding you use. The words are the same.
Programming language doesn't care about encoding. Compiler/interpreter does.