Search code examples
javacharacter-encodingreader

Why is reader in java reading characters wrong?


So i have been trying to read different characters from an unknown file (my own file extension .xs) but it doesn't seem to work. For example it reads this character '¸' as 65533 and its ASCII code is 184. Is it a problem with my code or with encoding (i am programming in Intellij). Here is my code:

import java.io.FileReader;
import java.io.IOException;
import java.io.Reader;

public class Main {
    public static void main(String[] args) throws IOException {
        Reader reading = new FileReader("ala.xs");

        int character;
        while ((character = reading.read()) != -1) {
            char ch = (char) character;
            System.out.println((int)ch);
        }

        reading.close();
    }
}

Here is the file ala.xs:"š ° otmkla¸HR8"

Here is the output of my program:"65533 32 65533 32 111 116 109 107 108 97 65533 72 82 56"

I tried changing encoding but it doesn't seem to work and i am honestly losing hope. Is this error becouse of reader reading wrong or me?


Solution

  • Here is the file ala.xs:"š ° otmkla¸HR8"

    How did you produce it and write it? 65533 is a replacement character and appears several times in your output, indicating that there are problems with encoding. You should probably use an explicit encoding when you read it, as currently you assume an encoding, so use an InputStreamReader with UTF-8

    import java.io.InputStreamReader;
    import java.io.FileInputStream;
    import java.io.IOException;
    import java.io.Reader;
    import java.nio.charset.StandardCharsets;
    
    public class Main {
        public static void main(String[] args) throws IOException {
            Reader reading = new InputStreamReader(new FileInputStream("ala.xs"), StandardCharsets.UTF_8);
    
            int character;
            while ((character = reading.read()) != -1) {
                char ch = (char) character;
                System.out.println((int) ch);
            }
    
            reading.close();
        }
    }
    

    Showing correct file and execution:

    goose@t410:/tmp$ cat ala.xs;echo 
    š ° otmkla¸HR8
    goose@t410:/tmp$ java Main
    353
    32
    176
    32
    111
    116
    109
    107
    108
    97
    184
    72
    82
    56
    goose@t410:/tmp$ 
    

    Obviously, make sure you can save it correctly in the first place as UTF-8