I need a python regex matching the part of a string multiple times:
My String: aaaa-bb-ccc-dd
My Pattern: ([A-z]+)\-([A-z]+)
I would like to have groups like this:
1: aaaa-bb
2: bb-ccc
3: ccc-dd
Does somebody have an idea on how to do this? If it does not work with regex only, a solution with a python for loop is also very welcome.
You can use lookahead to get overlapping matches:
(?=\b([A-Za-z]+-[A-Za-z]+)\b)
See the regex demo.
Details:
(?=
- start of a positive lookahead that matches a location that is immediately followed with
\b
- a word boundary([A-Za-z]+-[A-Za-z]+)
- Group 1: one or more ASCII letters, -
, one or more ASCII letters\b
- a word boundary)
- end of the lookahead.In Python, use it with re.findall
:
import re
text = "aaaa-bb-ccc-dd"
print( re.findall(r'(?=\b([A-Z]+-[A-Z]+)\b)', text, re.I) )
# => ['aaaa-bb', 'bb-ccc', 'ccc-dd']
See the Python demo. Note I changed [A-Za-z]
to [A-Z]
in the code since I made the regex matching case insensitive with the help of the re.I
option. Make sure you are using the r
string literal prefix or \b
will be treated as a BACKSPACE char, \x08
, and not a word boundary.
Variations
(?=\b([^\W\d_]+-[^\W\d_]+)\b)
- matching any Unicode letters(?=(?<![^\W\d_])([^\W\d_]+-[^\W\d_]+)(?![^\W\d_]))
- matching any Unicode letters and the boundaries are any non-letters