I want to count the number of words in a passage that contains both English and Chinese. For English, it's simple. Each word is a word. For Chinese, we count each character as a word. Therefore, 香港人 is three words here.
So for example, "I am a 香港人" should have a word count of 6.
Any idea how can I count it in Javascript/jQuery?
Thanks!
Try a regex like this:
/[\u00ff-\uffff]|\S+/g
For example, "I am a 香港人".match(/[\u00ff-\uffff]|\S+/g)
gives:
["I", "am", "a", "香", "港", "人"]
Then you can just check the length of the resulting array.
The \u00ff-\uffff
part of the regex is a unicode character range; you probably want to narrow this down to just the characters you want to count as words. For example, CJK Unified would be \u4e00-\u9fcc
.
function countWords(str) {
var matches = str.match(/[\u00ff-\uffff]|\S+/g);
return matches ? matches.length : 0;
}