Search code examples
character-encodingcodepagesmojibake

how to convert unicode text to utf8 text readable?


I got a serious problem regarding Unicode and utf8, I saved a paragraph of Arabic/Persian text file into notepad and saved it, now I saw my information like

Êæ Çíä ÓæÑÓ ÈÑäÇãå ÚÏÏ ÏáÎæÇåí Ñæ ÇÒ æÑæÏí ãííÑå æ Èå Øæá åãæä ÚÏÏ ãËáËí Ñæ ÑÓã ãí ˜äå 

my question is how to get back my data, it is important for me to get this data back, thanks in advance


Solution

  • The paragraph was scrambled by saving as code page 1256 (Arabic/Persian), then interpreted as code page 1252 (Western Europe), and finally saved as Unicode text. You can use C# to reverse this procedure:

    string scrambled = "Êæ Çíä ÓæÑÓ ÈÑäÇãå ÚÏÏ ÏáÎæÇåí Ñæ ÇÒ æÑæÏí ãííÑå æ " + 
                       "Èå Øæá åãæä ÚÏÏ ãËáËí Ñæ ÑÓã ãí ˜äå";
    byte[] bytes = Encoding.GetEncoding("windows-1252").GetBytes(scrambled);
    string plainText = Encoding.GetEncoding("windows-1256").GetString(bytes);
    Console.WriteLine(text);
    

    The plain text output is: "تو اين سورس برنامه عدد دلخواهي رو از ورودي ميگيره و به طول همون عدد مثلثي رو رسم مي کنه"