I'm using python 3.6.8, other versions are not usable. I need to convert a pandas column from Chinese to English, which contains around 20% Chinese text. Due to client requirements, I cannot use a translation API or library like Google Translate; instead I must use the pinyin
package.
So I wrote the following code
import pinyin
df['Pinyin_Text'] = df['Chinese_Text'].apply(lambda text: pinyin.get(text,format="strip", delimiter=" "))
But I'm seeing that my Pinyin_Text
field is providing phonetic transcription. I would like to format my Pinyin_Text
field.
Can you suggest to me how I can achieve that?
You can achieve this goal by giving Style.Normal parameter, some version changes will be required in your current setup if you are using a different library version, for this example the version I am using is 0.53:
pip install pypinyin
The code to get the desired result using the delimiter " " works as follows:
#using the version of library pypinyin-0.53
from pypinyin import pinyin, Style
texttogetphonetictoenglishalpha = "你好 你好,世界 你好"
pinyin_textcomplete = pinyin(texttogetphonetictoenglishalpha, style=Style.NORMAL)
pinyin_resultonlyEnglisAlpha = ' '.join([word[0] for word in pinyin_textcomplete])
print(pinyin_resultonlyEnglisAlpha)
The result returns alphabetic pronunciation in English instead of Phonetic symbols, I did not try with numbers, this should do the task.