cat doc.txt and the following characters will show:
你好 Hello!
这是中文。This is a Chinese doc.
I can use the command
wc -w doc.txt
but it will show:
8 doc.txt
this command take characters 你好 and 这是中文 both as a single word, while in fact 你好 are two Chinese words and 这是中文 four.
What I want is to get these Chinese words counting right(there are 12 words in the example), could anyone help out?
You can use -m
or --chars
option:
$ echo -n "你好" | wc -m
Output:
2