Search code examples
python-3.xpandasfunctionexceptiontextblob

Exception handling when applying function to pandas df


I want to do some language detection using the python package textblob: I created a new column in a pandas df which should contain the detected language:

from textblob import TextBlob
posts['Language']=posts['Caption'].apply(TextBlob.detect_language)

This code works. However, with one df it interrupts and throws an exeception ('TranslatorError') where the respective row contains less then 3 character. Therefore, I'd like write a function which ensures that the 'TextBlob.detect_language' function gets applied to the full df even when an exception occurs.

I thought about something like that:

def get_language(r):
    try:
        return r.TextBlob.detect_language()
    # except (r.TextBlob.detect_language==TranslatorError):
        return np.nan # where textblob was not able to detect language -> nan

However, I don't know what to write after the (outcommented) "except" clause. Any help?

The current function applied (with the except not outcommented)

posts['Language']=posts['Caption'].apply(get_language)

returns

AttributeError: 'TextBlob' object has no attribute 'TextBlob'

if I try

def get_language(r):
    try:
        return r.TextBlob.detect_language()
    except:
        pass # (or np.nan)

it just passes all the rows, i.e. doesn't detect the language for any row...

Thanks for help guys!


Solution

  • see below:

    from textblob import TextBlob
    import pandas
    
    def detect_language(text):
        try:
            b = TextBlob(text)
            return b.detect_language()
        except:
            return "Language Not Detected"
    
    df = pandas.DataFrame(data=[("na","hello"),("na", "bonjour"),("na", "_")], columns = ['Language', 'Caption']) 
    df['Language']=df['Caption'].apply(detect_language)
    df