Search code examples
c#encoding

Converting unknown characters to Greek characters


I have a file which contains the following characters:

ÇËÅÊÔÑÏÖÏÑÇÓÇ ÁÉÌÏÓÖÁÉÑÉÍÇÓ

I am trying to convert that to Greek words and the result should be:

ΗΛΕΚΤΡΟΦΟΡΗΣΗ ΑΙΜΟΣΦΑΙΡΙΝΗΣ

The file that the above value is stored in in Unicode format.

I am applying all possible encodings but no luck in the conversion.

private void Convert()
{
string textFilePhysicalPath = (@"C:\Users\Nec\Desktop\a.txt");
string contents = File.ReadAllText(textFilePhysicalPath);

List<string> sLines = new List<string>();
// For every encoding, get the property values.
EncodingInfo ei;
foreach (var ei in Encoding.GetEncodings())
{
    Encoding e = ei.GetEncoding();

    Encoding iso = Encoding.GetEncoding(ei.Name);
    Encoding utfx = Encoding.Unicode;
    byte[] utfBytes = utfx.GetBytes(contents);
    byte[] isoBytes = Encoding.Convert(utfx, iso, utfBytes);
    string msg = iso.GetString(isoBytes);

    string xx = (ei.Name + " " + msg);
    sLines.Add(xx);
}

using (StreamWriter file = new StreamWriter(@"C:\Users\Nec\Desktop\result.txt"))
{
    foreach (var line in sLines)
        file.WriteLine(line);
}
}

A website that converts it correctly is http://www.online-decoder.com/el but even when I use the ISO-8859-1 to ISO-8859-7 it still doesn't work in .NET.


Solution

  • This code converts the string from the C# which is UTF-16 to an 8-bit representation using the common ISO-8859-1 codepage. Then it converts it back to UTF-16 using the greek codepage windows-1253. The result is ΗΛΕΚΤΡΟΦΟΡΗΣΗ ΑΙΜΟΣΦΑΙΡΙΝΗΣ as you want.

    string errorneousString = "ÇËÅÊÔÑÏÖÏÑÇÓÇ ÁÉÌÏÓÖÁÉÑÉÍÇÓ";
    byte[] asIso88591Bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(errorneousString);
    string asGreekString = Encoding.GetEncoding("windows-1253").GetString(asIso88591Bytes);
    Console.OutputEncoding = System.Text.Encoding.UTF8;
    Console.WriteLine(asGreekString);
    

    Edit: Since your file is encoded in an 8-bit format, you need to specify the codepage when reading it. Use this:

    string fileContents = File.ReadAllText("189.dat", Encoding.GetEncoding("windows-1253"));
    Console.OutputEncoding = System.Text.Encoding.UTF8;
    Console.WriteLine(fileContents);
    

    That reads the content as

    'CS','C.S.F. EXAMINATION','ΕΞΕΤΑΣΗ Ε.Ν.Υ.' 'EH','Hb ELECTROPHORESIS','ΗΛΕΚΤΡΟΦΟΡΗΣΗ ΑΙΜΟΣΦΑΙΡΙΝΗΣ' 'EP','PROTEIN ELECTROPHORESIS','ΗΛΕΚΤΡΟΦΟΡΗΣΗ ΠΡΩΤΕΙΝΩΝ' 'FB','HAEMATOLOGY - FBC','ΓΕΝΙΚΗ ΕΞΕΤΑΣΗ ΑΙΜΑΤΟΣ - FBC' 'FR','FREE TEXT', 'GT','GLUCOSE TOLERANCE TEST','ΔΟΚΙΜΑΣΙΑ ΑΝΟΧΗΣ ΓΛΥΚΟΖΗΣ' 'MI','MICROBIOLOGY','ΜΙΚΡΟΒΙΟΛΟΓΙΑ' 'NO','NORMAL FORM','ΚΑΝΟΝΙΚΟ ΔΕΛΤΙΟ' 'RE','RENAL CALCULUS','ΧΗΜΙΚΗ ΑΝΑΛΥΣΗ ΟΥΡΟΛΙΘΟΥ' 'SE','SEMEN ANALYSIS','ΣΠΕΡΜΟΔΙΑΓΡΑΜΜΑ' 'SP','SPECIAL PATHOLOGY','SPECIAL PATHOLOGY' 'ST','STOOL EXAMINATION
    ','ΕΞΕΤΑΣΗ ΚΟΠΡΑΝΩΝ' 'SW','SEMEN WASH','SEMEN WASH' 'TH','THROMBOPHILIA PANEL','THROMBOPHILIA PANEL' 'UR','URINE ANALYSIS','ΓΕΝΙΚΗ ΕΞΕΤΑΣΗ ΟΥΡΩΝ' 'WA','WATER CULTURE REPORT','ΑΝΑΛΥΣΗ ΝΕΡΟΥ' 'WI','WIDAL ','ΑΝΟΣΟΒΙΟΛΟΓΙΑ'