I have a column "City_trad_chinese"
in a pandas dataframe "df"
which contains values in Traditional Chinese language. I need to create another column "City_English"
which must contain the translated values in English.
How can I do this with Python? I tried the following:
#importing required libraries
import pandas as pd
from os import path
from googletrans import Translator
#setting path to data
path2data = 'C:/Users/data'
# data import
df = pd.read_excel(path.join(path2data, 'data.xlsx'), converters={'City_trad_chinese':str})
translator = Translator()
df['City_English'] = df['City_trad_chinese'].map(lambda x: translator.translate(x, src="zh-TW", dest="en").text)
but it is giving me an error:
raise JSONDecodeError("Expecting value", s, err.value) from None
JSONDecodeError: Expecting value
You can use the library googletrans
import pandas as pd
from googletrans import Translator
d = {"City_trad_chinese":["香港特别行政区",
"澳门特别行政区",
"北京市",
"上海市"]}
df = pd.DataFrame(data=d)
translator = Translator()
df["City_English"] = df["City_trad_chinese"].map(lambda x: translator.translate(x, src="zh-TW", dest="en").text)
print(df["City_English"])
0 Hong Kong Special Administrative Region
1 Macao Special Administrative Region
2 Beijing City
3 Shanghai City
Note: The Google Translate API has a 15k character limit. You can circumnavigate this by translating each row individually:
df["City_English"] = ""
for index, row in df.iterrows():
translator = Translator()
eng_text = translator.translate(row["City_trad_chinese"], src="zh-TW", dest="en").text
row["City_English"] = eng_text