Search code examples
pythongoogle-translation-api

Why the output return NaN when I want to translate certain language (Malay) to English in Python?


Firstly, I use FasText for language detection. Then, from the language detection, I want to translate from certain language (in this case, Malay) to English. For the translation part, I use Google Translate API using Python. The problem is, the output return NaN value for other language (in this case, English and Thai). I want to return only the translated text only, which is Malay.

from googletrans import Translator
import pandas as pd
import numpy as np
translator = Translator()


df = pd.DataFrame({
'text': ["how are you", "suka makan nasi ayam", "สวัสด","hai, apa khabar"], 
'lang': ["english", "malay", "thai","malay"]
})
df

Dataframe df:

enter image description here

df1=df[df["lang"] == "malay"]
df['text'] = df1['text'].apply(translator.translate, dest='en').apply(getattr, args=('text',))
df

Generated output:

enter image description here

Desired output:

text                       |    lang
-----------------------------------
how are you                |   english
like to eat chicken rice   |   malay
สวัสด                      |   thai
Hello how are you          |   malay

Solution

  • You need to use a boolean mask:

    translate_en = lambda x: translator.translate(x, dest='en').text
    
    m = df['lang'] == 'malay'
    df.loc[m, 'text'] = df.loc[m, 'text'].apply(translate_en)
    print(df)
    
    # Output
                           text     lang
    0               how are you  english
    1  like to eat chicken rice    malay
    2                     สวัสด     thai
    3         Hello how are you    malay
    

    Same with update:

    df.update(df.loc[m, 'text'].apply(translate_en))
    print(df)
    
    # Output
                           text     lang
    0               how are you  english
    1  like to eat chicken rice    malay
    2                     สวัสด     thai
    3         Hello how are you    malay