Search code examples
androidutf-8html-parsing

Turkish character problems while parsing (Android)


I am parsing an html content and have output on my screen. This website have Turkish characters such as çÇşŞöÖğĞıİüÜ. I am not able to show them as proper characters, they are printed out as question marks yet.

Eclipse -> Project -> Properties -> Resource -> Text File Encoding = Inherited from container (Cp1254)

I searched web and found this solution:

Eclipse -> Project -> Properties -> Resource -> Text File Encoding = Other: UTF-8

However, it's not working. It only changes my files' current characters. (I have titles that have such characters on my activities)

Any help? Thanks in advance...


Solution

  • OK, I found a real solution finally. Depending on where you are parsing from (I am retrieving data from a charset = iso-8859-9 and on Eclipse, using utf-8) you should make char replace operations. For my case,

        context = context.replaceAll("İ", "İ");
        context = context.replaceAll("ı", "ı");
        context = context.replaceAll("Ö", "Ö");
        context = context.replaceAll("ö", "ö");
        context = context.replaceAll("Ü", "Ü");
        context = context.replaceAll("ü", "ü");
        context = context.replaceAll("Ç", "Ç");
        context = context.replaceAll("ç", "ç");
        context = context.replaceAll("Ğ", "Ğ");
        context = context.replaceAll("ğ", "ğ");
        context = context.replaceAll("Ş", "Ş");
        context = context.replaceAll("ş", "ş");
    

    where context is a String that holds all of the parsed data and will be printed out on a TextView. That's all. I should have thought that a lot before!