Search code examples

UTF-8 decoding problems

I am getting crazy with UTF-8 decoding of some URLs. I am using

URLDecoder.decode ( 

to decode some URLs with special chars. As you can see below for some location names in the URL the decode works and for some it does not ...

biha%C4%87 --> biha? (WRONG)
d%C3%A9partement+morbihan --> département morbihan (CORRECT)
gespanschaft+me%C4%91imurje --> gespanschaft me?imurje (WRONG)
hajd%C3%BA+bihar --> hajdú bihar (CORRECT)

Any Ideas? would highly appriciate! Thom


  • Using URLDecoder.decode(url, "UTF-8") all your URLs are decoded correctly.

    However the decoded strings of case 1 and 3 contain characters with codepoint 263 and 273.
    Most likely you printed these strings to a console which cannot print characters with codepoints > 255 and which replaces those with a ?.