But I am getting empty tuples for the entities and no results for pos_.
from spacy.lang.zh import Chinese
nlp = Chinese()
doc = nlp(u"蘋果公司正考量用一億元買下英國的新創公司")
doc.ents
# returns (), i.e. empty tuple
for word in doc:
print(word.text, word.pos_)
''' returns
蘋果
公司
正
考量
用
一
億元
買
下
英國
的
新創
公司
'''
I am new to NLP. I want to know what is the correct way to do ?
EDIT 3/21: Spacy now supports NER and POS tagging for CN
Find the SpaCy model here: https://spacy.io/models/zh
OLD ANSWER:
SpaCy is a fantastic package, but as of yet does not support Chinese, so I assume thats the reason you dont get POS results - even though your sentence is
"Apple is looking at buying U.K. startup for $1 billion"
in traditional Chinese and should therefore return "Apple" and "U.K." as ent
, among others.
For a more extensive NLP approach to traditional Chinese, you can try using the Stanford Chinese NLP package - you are using python, and there are versions available for python (see a demo script or an intro on Medium), but the original is Java, if you are more comfortable with that.