Search code examples
javastringencodingnon-english

print non-English String in Java


When I'm printing the movie name "Yôjinbô" (http://www.imdb.com/title/tt0055630/?ref_=chttp_tt_107) that contains some non-English characters, it appears to be "Yôjinbô" in the output window in eclipse.

I cannot find any encoding setting in eclipse or the project properties. How to correctly print the movie name?

-------------------- update -------------------

I found where the problem happened. The following is the code to get the movie info from omdbapi.com When I print the line after reader.readLine(), the name is wrong.

writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("movies/movie_" + i + ".txt"), "utf-8"));
sb = new StringBuilder();
ret = new StringBuilder();
title = URLEncoder.encode(movieNames[i], "UTF-8");
sb.append("http://www.omdbapi.com/?");
sb.append("t=").append(title).append("&");
sb.append("y=").append(year).append("&");
sb.append("plot=").append(plot).append("&");
sb.append("r=").append(r);
CloseableHttpClient client = HttpClients.createDefault();
String url = sb.toString();
HttpGet get = new HttpGet(url);
HttpResponse response = client.execute(get);
BufferedReader reader = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));

String line = reader.readLine(); // <-------- wrong here
while (line != null) {
    System.out.println(line);
    writer.write(line);
    line = reader.readLine();
}

But when I paste the url (http://www.omdbapi.com/?t=Y%C3%B4jinb%C3%B4&y=&plot=short&r=json) directly in the chrome, the response is correct.

------------------- problem solved -------------------

The only thing I need to do is to set "UTF8" when creating the InputStreamReader, as follows

BufferedReader reader = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), "UTF8"));

Thanks for all your help.

------------------- more update -------------------

It turns out that FileReader and FileWriter implicitly use the system's default character encoding, which may cause problem, and should be replaced by alternatives.

// br = new BufferedReader(new FileReader(filename)); // <---- cause encoding problem here
br = new BufferedReader(new InputStreamReader(new FileInputStream(filename), "UTF-8"));

http://www.javapractices.com/topic/TopicAction.do?Id=42


Solution

  • Maybe this or this can help you.

    The web is plenty of other posts about your issue. Search for it.