Search code examples
pythontextblob

How to get language expansion in Textblob python language detection library


I need a language detection script. I tried Textblob library which right now give me the two letter abbreviation of the language. How can I get the complete language expansion?

This detects the language with two letter abbreviation of the language

from textblob import TextBlob
b = TextBlob("cómo estás")
language = b.detect_language()
print(language)

Actual Results : es
Expected Results : Spanish

I have the list of language and their abbreviation from this link
https://developers.google.com/admin-sdk/directory/v1/languages


Solution

  • The code you're using gives you a two-letter abbreviation that conforms to the ISO 639-2 international protocol. You could look up a list of these correspondences (e.g. this page and rig up a method to just input one and output the other, but given you're programming in python, someone's already done that for you.

    I recommend pycountry - a general-purpose library for this type of task that also contains a number of other standards. Example of using it for this problem:

    from textblob import TextBlob
    import pycountry
    b = TextBlob("நீங்கள் எப்படி இருக்கிறீர்கள்")
    iso_code = b.detect_language()  
    # iso_code = "ta"
    language = pycountry.languages.get(alpha_2=iso_code)
    # language = Language(alpha_2='ta', alpha_3='tam', name='Tamil', scope='I', type='L')
    print(language.name)
    

    and that prints Tamil, as expected. Same works for Spanish:

    >>> pycountry.languages.get(alpha_2='es').name
    'Spanish'
    

    and probably most other languages you'll encounter in whatever it is you're doing..