Search code examples
javaandroid-studiolocalizationgoogle-books

Code is not translating german characters from Google Books API correctly


I have produced a little app that searches and displays for me data which I retrieve from Google Books in a neat but simple fashion. Everything works so far, but there is an issue directly at the source: Though Google provides me correctly with German text search results, it for some reason displays all special German characters (Ä, Ö, Ü and ß probably) as the "�" dummy or sometimes just "?".

I was able to confirm that the JSONObject built from the InputStream already contains those mistakes. It seems like the original inputstream from Google is not being read correctly. Weird is that I have "UTF-8" encoding (which should contain german characters) added to my InputStreamReader, but to no avail apparently.

Here is the http-request procedure I am using:

public class HttpRequest {

public static String request(String urlString) throws IOException {
    URL url = new URL(urlString);
    URLConnection connection = url.openConnection();
    connection.setConnectTimeout(5000);
    connection.setReadTimeout(10000);
    BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream(), "UTF-8"));
    StringBuilder builder = new StringBuilder();
    String inputLine;
    while((inputLine = in.readLine()) != null)
        builder.append(inputLine);
    in.close();
    return builder.toString();
}
}

What else could be going wrong? I checked the StringBuilder already, but the mistakes are already in the inputLine(s) that get read out of the BufferedReader. Also, I was unable to find any language or encoding specific settings in the official google books api guide, so I guess they should come with universal encoding, but then the "UTF-8" flag should detect them, or not?


Solution

  • Easiest is to check the raw data in another way, such as a browser. Looking at a Google Books api url response in the browser is quite simple, just use the url and the response comes back as json. Optionally install a json viewer plugin, but not needed for this.

    For example use this url:

    https://www.googleapis.com/books/v1/volumes?q=Latein+key=NO
    

    Checking the http header (in the browser developer tools for example) you can see that the header list the content as having the expected encoding:

    content-type: application/json; charset=UTF-8
    

    Look at the specific content for some German results and the text there and we can see that it is correct German special characters for some books, but not for all. Depending on the book in question.

    Conclusion: UTF-8 is indeed correct and the source/raw data has missing/wrong data for some texts for the German characters.