Search code examples
c#unicodegoogle-translate

C# Google translate without api and with unicode


I want to translate a string in various languages with google and without api in C#. This is my code:

public string TranslateWithGoogle(string input, string languagePair)
{
    try
    {
        string url = String.Format("http://www.google.com/translate_t?hl=en&ie=UTF8&text={0}&langpair={1}", input, languagePair);
        WebClient webClient = new WebClient();
        webClient.Encoding = System.Text.Encoding.Default;
        string result = webClient.DownloadString(url);
        result = result.Substring(result.IndexOf("<span title=\"") + "<span title=\"".Length);
        result = result.Substring(result.IndexOf(">") + 1);
        result = result.Substring(0, result.IndexOf("</span>"));
        return result.Trim();
    }
    catch (Exception exc)
    {
        MessageBox.Show(exc.ToString());
        return string.Empty;
    }
        
}

so now when it comes to testing with C# vs directly the browser I use this code:

string strSource_String = "Debug offline mode";
string strSource_Language = "en";
string str_It = TranslateWithGoogle(strSource_String, strSource_Language+"|it");
string str_Fr = TranslateWithGoogle(strSource_String, strSource_Language + "|fr");
string str_De = TranslateWithGoogle(strSource_String, strSource_Language + "|de");
string str_Ru = TranslateWithGoogle(strSource_String, strSource_Language + "|ru");
string str_Bg = TranslateWithGoogle(strSource_String, strSource_Language + "|bg");
string str_Cz = TranslateWithGoogle(strSource_String, strSource_Language + "|cz");
string str_Pl = TranslateWithGoogle(strSource_String, strSource_Language + "|pl");

and the result C#/browser is:

IT

C#: "Esegui il debug in modalità offline"

Browser: "Esegui il debug in modalità offline"

OK! and also the à char is correct

FR

C#: "Déboguer le mode hors connexion"

Browser: "Déboguer le mode hors connexion"

OK! and also the é char is correct

Russian

C#: "Ðåæèì îòëàäêè â àâòîíîìíîì ðåæèìå"

Browser: "Режим отладки в автономном режиме"

Wrong :-(

and the same problem with Bulgarian and Czech language. I have tried to change all webClient.Encoding = System.Text.Encoding.Default; options but that was no help.

Thanks for helping

Patrick


Solution

  • If you check the header section of the returned HTML you will see that it uses charset "windows-1251" - which is specifically for the Cyrillic characters. You need to set the encoding for that.

    There may be better ways to get header information prior to downloading the page, but if you are happy to download the page twice - then you could check the charset used & if it is "windows-1251", then change the encoding & download again.

    Something like :

    string result = webClient.DownloadString(url);
    if (result.Contains("windows-1251"))
    {
      webClient.Encoding = System.Text.Encoding.GetEncoding("windows-1251");
      result = webClient.DownloadString(url);
    }
    else if (result.Contains("ISO-8859-2"))
    {
      webClient.Encoding = System.Text.Encoding.GetEncoding("ISO-8859-2");
      result = webClient.DownloadString(url);
    }
    

    you may want to modify it to ensure that the "windows-1251" is in the header section