I need a language detection script. I tried Textblob library which right now give me the two letter abbreviation of the language. How can I get the complete language expansion?
This detects the language with two letter abbreviation of the language
from textblob import TextBlob
b = TextBlob("cómo estás")
language = b.detect_language()
print(language)
Actual Results : es
Expected Results : Spanish
I have the list of language and their abbreviation from this link
https://developers.google.com/admin-sdk/directory/v1/languages
The code you're using gives you a two-letter abbreviation that conforms to the ISO 639-2
international protocol. You could look up a list of these correspondences (e.g. this page and rig up a method to just input one and output the other, but given you're programming in python, someone's already done that for you.
I recommend pycountry
- a general-purpose library for this type of task that also contains a number of other standards. Example of using it for this problem:
from textblob import TextBlob
import pycountry
b = TextBlob("நீங்கள் எப்படி இருக்கிறீர்கள்")
iso_code = b.detect_language()
# iso_code = "ta"
language = pycountry.languages.get(alpha_2=iso_code)
# language = Language(alpha_2='ta', alpha_3='tam', name='Tamil', scope='I', type='L')
print(language.name)
and that prints Tamil
, as expected. Same works for Spanish:
>>> pycountry.languages.get(alpha_2='es').name
'Spanish'
and probably most other languages you'll encounter in whatever it is you're doing..