Search code examples
pythonpandastranslatepypi

Python pandas: Create a new column with values in English by converting values stored in a different column in Chinese traditional


I have a column "City_trad_chinese" in a pandas dataframe "df" which contains values in Traditional Chinese language. I need to create another column "City_English" which must contain the translated values in English.

How can I do this with Python? I tried the following:

#importing required libraries
import pandas as pd 

from os import path

from googletrans import Translator

#setting path to data
path2data = 'C:/Users/data'

# data import
df = pd.read_excel(path.join(path2data, 'data.xlsx'), converters={'City_trad_chinese':str})


translator = Translator()

df['City_English'] = df['City_trad_chinese'].map(lambda x: translator.translate(x, src="zh-TW", dest="en").text)

but it is giving me an error:

raise JSONDecodeError("Expecting value", s, err.value) from None

JSONDecodeError: Expecting value

Solution

  • You can use the library googletrans

    import pandas as pd
    from googletrans import Translator
    
    d = {"City_trad_chinese":["香港特别行政区",
                              "澳门特别行政区",
                              "北京市",
                              "上海市"]}
    df = pd.DataFrame(data=d)
    
    translator = Translator()
    
    df["City_English"] = df["City_trad_chinese"].map(lambda x: translator.translate(x, src="zh-TW", dest="en").text)
    

    print(df["City_English"])
    
    0    Hong Kong Special Administrative Region
    1        Macao Special Administrative Region
    2                               Beijing City
    3                              Shanghai City
    

    Note: The Google Translate API has a 15k character limit. You can circumnavigate this by translating each row individually:

    df["City_English"] = ""
    
    for index, row in df.iterrows():
        translator = Translator()
        eng_text = translator.translate(row["City_trad_chinese"], src="zh-TW", dest="en").text
        row["City_English"] = eng_text