Search code examples

Regex for Matching numeral Pinyin

I'm looking for a regex to match numeral pinyin lexical unit (one or more pinyin without space).

Reading Regex for Matching Pinyin seems a good start as I was able to quickly add the support for numeral by doing :


So essentially wrapping the old regexp in a group and appending the numeral condition. However I'm not able to extend this to the case of multiple words. For instance :

jiao4zuo4zhi1wu4    叫座之物
jiao4zu3    教祖
jiao4zong1xuan3ju3  教宗选举
jiao4zi3    教子
jiao4zhun3yi2qi4    校准仪器
jiao4zhun3tiao2     校准条
jiao4zhun3ti1chi3   校准梯尺
jiao4zhun3quan1     校准圈
jiao4zhun3qi4   校准器
jiao4zhun3pu3   校准谱 

N.B.: This expression will be used in a Javascript context.


  • Here is the regexp I'm using based on @EagleV_Attnam solution and some addition what I've done fin:


    The addition of the start ^ and end $ anchor solve my issues :)

    Full regex is:
