Search code examples
javautf-8inputstream

Reading InputStream as UTF-8


I'm trying to read from a text/plain file over the internet, line-by-line. The code I have right now is:

URL url = new URL("http://kuehldesign.net/test.txt");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
LinkedList<String> lines = new LinkedList();
String readLine;

while ((readLine = in.readLine()) != null) {
    lines.add(readLine);
}

for (String line : lines) {
    out.println("> " + line);
}

The file, test.txt, contains ¡Hélló!, which I am using in order to test the encoding.

When I review the OutputStream (out), I see it as > ¬°H√©ll√≥!. I don't believe this is a problem with the OutputStream since I can do out.println("é"); without problems.

Any ideas for reading form the InputStream as UTF-8? Thanks!


Solution

  • Solved my own problem. This line:

    BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
    

    needs to be:

    BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));
    

    or since Java 7:

    BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), StandardCharsets.UTF_8));