Search code examples
android-networking

Encoding URLs containing unicode characters


Is there an Android class that (correctly) encodes URLs containing unicode characters? For example:

Blue Öyster Cult

Is converted to the following using java.net.URI:

uri.toString()
 (java.lang.String) Blue%20Öyster%20Cult

The Ö character is not encoded. Using URLEncoder:

URLEncoder.encode("Blue Öyster Cult", "UTF-8").toString()
 (java.lang.String) Blue+%C3%96yster+Cult

It encodes too much (i.e. spaces become "+" and path separators "/" become %2F). If I click on a link containing unicode characters with the Dolphin web browser it works correctly, so obviously this can be done. But if I try to open an HttpURLConnection using any of the above strings, I get an HTTP 404 Not Found exception.


Solution

  • I ended up hacking together a solution that seems to work for this, but is probably not the most robust:

    url = new URL(userSuppliedPath);
    String context = url.getProtocol();
    String hostname = url.getHost();
    String thePath = url.getPath();
    int port = url.getPort();
    thePath = thePath.replaceAll("(^/|/$)", ""); // removes beginning/end slash
    String encodedPath = URLEncoder.encode(thePath, "UTF-8"); // encodes unicode characters
    encodedPath = encodedPath.replace("+", "%20"); // change + to %20 (space)
    encodedPath = encodedPath.replace("%2F", "/"); // change %2F back to slash
    urlString = context + "://" + hostname + ":" + port + "/" + encodedPath;