I want to do some language detection using the python package textblob: I created a new column in a pandas df which should contain the detected language:
from textblob import TextBlob
posts['Language']=posts['Caption'].apply(TextBlob.detect_language)
This code works. However, with one df it interrupts and throws an exeception ('TranslatorError') where the respective row contains less then 3 character. Therefore, I'd like write a function which ensures that the 'TextBlob.detect_language' function gets applied to the full df even when an exception occurs.
I thought about something like that:
def get_language(r):
try:
return r.TextBlob.detect_language()
# except (r.TextBlob.detect_language==TranslatorError):
return np.nan # where textblob was not able to detect language -> nan
However, I don't know what to write after the (outcommented) "except" clause. Any help?
The current function applied (with the except not outcommented)
posts['Language']=posts['Caption'].apply(get_language)
returns
AttributeError: 'TextBlob' object has no attribute 'TextBlob'
if I try
def get_language(r):
try:
return r.TextBlob.detect_language()
except:
pass # (or np.nan)
it just passes all the rows, i.e. doesn't detect the language for any row...
Thanks for help guys!
see below:
from textblob import TextBlob
import pandas
def detect_language(text):
try:
b = TextBlob(text)
return b.detect_language()
except:
return "Language Not Detected"
df = pandas.DataFrame(data=[("na","hello"),("na", "bonjour"),("na", "_")], columns = ['Language', 'Caption'])
df['Language']=df['Caption'].apply(detect_language)
df