Is there solution to find word boundaries in Japanese string (E.g.: "私はマーケットに行きました。") via JavaScript regular expressions("xregexp" JS library cab be used)?
E.g.:
var xr = RegExp("\\bst","g");
xr.test("The string") // --> true
I need the same logic for Japanese strings.
However, the actual problem of separating the Japanese sentence into words is more complicated than it appears, since words are not separated into spaces as is the case, for example, in English.
For example, the sentence 私はマーケットに行きました。 ("I went to the market") has the following words:
A reliable parser of Japanese sentences would, among other things, have to find where the particles (wa and ni) lie in the sentence, in order to find the remaining words.