Search code examples
c#language-detection

How to detect the language of a string?


What's the best way to detect the language of a string?


Solution

  • If the context of your code have internet access, you can try to use the Google API for language detection. http://code.google.com/apis/ajaxlanguage/documentation/

    var text = "¿Dónde está el baño?";
    google.language.detect(text, function(result) {
      if (!result.error) {
        var language = 'unknown';
        for (l in google.language.Languages) {
          if (google.language.Languages[l] == result.language) {
            language = l;
            break;
          }
        }
        var container = document.getElementById("detection");
        container.innerHTML = text + " is: " + language + "";
      }
    });
    

    And, since you are using c#, take a look at this article on how to call the API from c#.

    UPDATE: That c# link is gone, here's a cached copy of the core of it:

    string s = TextBoxTranslateEnglishToHebrew.Text;
    string key = "YOUR GOOGLE AJAX API KEY";
    GoogleLangaugeDetector detector =
       new GoogleLangaugeDetector(s, VERSION.ONE_POINT_ZERO, key);
    
    GoogleTranslator gTranslator = new GoogleTranslator(s, VERSION.ONE_POINT_ZERO,
       detector.LanguageDetected.Equals("iw") ? LANGUAGE.HEBREW : LANGUAGE.ENGLISH,
       detector.LanguageDetected.Equals("iw") ? LANGUAGE.ENGLISH : LANGUAGE.HEBREW,
       key);
    
    TextBoxTranslation.Text = gTranslator.Translation;
    

    Basically, you need to create a URI and send it to Google that looks like:

    http://ajax.googleapis.com/ajax/services/language/translate?v=1.0&q=hello%20worled&langpair=en%7ciw&key=your_google_api_key_goes_here

    This tells the API that you want to translate "hello world" from English to Hebrew, to which Google's JSON response would look like:

    {"responseData": {"translatedText":"שלום העולם"}, "responseDetails": null, "responseStatus": 200}
    

    I chose to make a base class that represents a typical Google JSON response:

    [Serializable]
    public class JSONResponse
    {
       public string responseDetails = null;
       public string responseStatus = null;
    }
    

    Then, a Translation object that inherits from this class:

    [Serializable]
    public class Translation: JSONResponse
    {
       public TranslationResponseData responseData = 
        new TranslationResponseData();
    }
    

    This Translation class has a TranslationResponseData object that looks like this:

    [Serializable]
    public class TranslationResponseData
    {
       public string translatedText;
    }
    

    Finally, we can make the GoogleTranslator class:

    using System;
    using System.Collections.Generic;
    using System.Text;
    
    using System.Web;
    using System.Net;
    using System.IO;
    using System.Runtime.Serialization.Json;
    
    namespace GoogleTranslationAPI
    {
    
       public class GoogleTranslator
       {
          private string _q = "";
          private string _v = "";
          private string _key = "";
          private string _langPair = "";
          private string _requestUrl = "";
          private string _translation = "";
    
          public GoogleTranslator(string queryTerm, VERSION version, LANGUAGE languageFrom,
             LANGUAGE languageTo, string key)
          {
             _q = HttpUtility.UrlPathEncode(queryTerm);
             _v = HttpUtility.UrlEncode(EnumStringUtil.GetStringValue(version));
             _langPair =
                HttpUtility.UrlEncode(EnumStringUtil.GetStringValue(languageFrom) +
                "|" + EnumStringUtil.GetStringValue(languageTo));
             _key = HttpUtility.UrlEncode(key);
    
             string encodedRequestUrlFragment =
                string.Format("?v={0}&q={1}&langpair={2}&key={3}",
                _v, _q, _langPair, _key);
    
             _requestUrl = EnumStringUtil.GetStringValue(BASEURL.TRANSLATE) + encodedRequestUrlFragment;
    
             GetTranslation();
          }
    
          public string Translation
          {
             get { return _translation; }
             private set { _translation = value; }
          }
    
          private void GetTranslation()
          {
             try
             {
                WebRequest request = WebRequest.Create(_requestUrl);
                WebResponse response = request.GetResponse();
    
                StreamReader reader = new StreamReader(response.GetResponseStream());
                string json = reader.ReadLine();
                using (MemoryStream ms = new MemoryStream(Encoding.Unicode.GetBytes(json)))
                {
                   DataContractJsonSerializer ser =
                      new DataContractJsonSerializer(typeof(Translation));
                   Translation translation = ser.ReadObject(ms) as Translation;
    
                   _translation = translation.responseData.translatedText;
                }
             }
             catch (Exception) { }
          }
       }
    }