Search code examples
javaencodingutf-8character-encodingdecoding

UTF-8 decoding problems


I am getting crazy with UTF-8 decoding of some URLs. I am using

URLDecoder.decode (java.net.URLDecoder) 

to decode some URLs with special chars. As you can see below for some location names in the URL the decode works and for some it does not ...

biha%C4%87 --> biha? (WRONG)
d%C3%A9partement+morbihan --> département morbihan (CORRECT)
gespanschaft+me%C4%91imurje --> gespanschaft me?imurje (WRONG)
hajd%C3%BA+bihar --> hajdú bihar (CORRECT)

Any Ideas? would highly appriciate! Thom


Solution

  • Using URLDecoder.decode(url, "UTF-8") all your URLs are decoded correctly.

    However the decoded strings of case 1 and 3 contain characters with codepoint 263 and 273.
    Most likely you printed these strings to a console which cannot print characters with codepoints > 255 and which replaces those with a ?.