how to use shell to count Chinese characters in file encoded in UTF-8

cat doc.txt and the following characters will show:

你好 Hello!
这是中文。This is a Chinese doc.

I can use the command

wc -w doc.txt

but it will show:

8 doc.txt

this command take characters 你好 and 这是中文 both as a single word, while in fact 你好 are two Chinese words and 这是中文 four.

What I want is to get these Chinese words counting right(there are 12 words in the example), could anyone help out?

Solution

You can use -m or --chars option:

$ echo -n "你好" | wc -m

Output: