Search code examples
regexunicodepropertiescjknon-ascii-characters

How can I match Regex with multiple unicode properties with AND (JavaScript)?


I want to match characters via regex on:

  • /\p{Script=Han}/gu AND /\p{Alphabetic}/gu

This would mean that:

  • matches, it's both Han and Alphabetic
  • doesn't match, it's alphabetic but not Han
  • doesn't match, it's Han but not alphabetic (it's a radical)

Ideally someone can show me how to do it with browser-based JavaScript.

PS:

I was using this before: /[\u4e00-\u9faf\u3400-\u4dbf]/g
but the issue is that it won't match all Han characters like so I rather use /\p{Script=Han}/gu but avoid any non-alphabetic characters like radicals etc.


Solution

  • You can use a positive lookahead assertion to match only Alphabetic results that also match Han with something like the following:

    const expr = /(?=\p{Script=Han})\p{Alphabetic}/gu;
    

    This gives your desired output I believe