Search code examples
python-3.xregexpython-re

How can I use regex to get numbers and Chinese?


import re
text = "我去臺中10天9夜,我去臺中10天九夜"

I have text like this.

res = re.findall(regex, text)
print(res)
# ["10天9夜", "10天九夜"]

I want to use regex to get the res like this. How can I get this res?

If I want to math ["10天9夜"], I can use re.findall("\d+\天\d+\夜", text), but it only match one.


Solution

  • I suggest using

    re.findall(r'(?:\d+[^\W\d_]+)+', text)
    

    See the regex demo, this pattern matches one or more consecutive sequences of digits and then letters.

    Details:

    • (?: - start of a "container", a non-capturing group
      • \d+ - one or more digits
      • [^\W\d_]+ - one or more letters
    • )+ - end of the group, repeat one or more times.