Search code examples
javautf-8java.util.scanner

Scan input with UTF-8 in Java


I am trying to find a way to get values actually typed into a variable so I can output them into a file.
The problem is that the scanner doesn't recognize the Czech letters such as "ř ; á ; ž ; š" etc.
Here is the code sample:

String jmeno;
Scanner input= new Scanner(System.in, "utf-8");
jmeno = input.next();

What I type:

Šárka

What is stored in variable jméno:

??rka

The "?" represent junk characters, with the question marks in black boxes.
How should I adjust the code, so that the letters are recieved correctly by the variable?

Also the computer I am using has EN(US) system encoding.


Solution

  • Use new Scanner(System.in). This uses the default operating system's encoding, the same as System.in uses. The String (as always) contains the Unicode version, which you then may write to a file using

    new OutputStreamWriter(new FileOutputStream(...), "UTF-8")
    

    or other (simpler) methods.