Search code examples
python-3.xpython-re

How to use regex to find different rules?


Here is my code

import re
text = '''--abc—     --cba--'''
res = re.findall("[-]+(.*?)[-]+|[-]+(.*?)[—]+", text)
# [('abc—     ', ''), ('', '')]
res_02 = re.findall("-+.*?-+|-+.*?—+", text)
# ['--abc—     --', '--']

What I want is

res =  ['abc', 'cba']
res_02 = ['--abc—', '--cba--']

How should I modify it?


Solution

  • Your issue with the first regex is caused by the alternations in your regex, which is unnecessary as they capture exactly the same thing on both sides. Just remove them and you get your desired result. Note that your second regex gives the correct result although again the alternation is unnecessary.

    import re
    
    text = '''--abc—     --cba--'''
    res = re.findall(r'-+([^-]*)-+', text)
    # ['abc', 'cba']
    res_02 = re.findall(r'-+[^-]*-+', text)
    # ['--abc-', '--cba--']
    

    Note I would use [^-]* rather than .*? as it will be more efficient and prevent matching --abc-- beyond the -- immediately after abc. Also you don't need to put - in a character class ([-]) as you did in your first regex.