using the Python module re, I would like to detect sequences that contain at least two letters (A-Z) and at least two digits (0-9) from a text, e.g., from the text
"N03FZ467 other text N03671"
precisely the sub-string "N03FZ467" shall be matched.
The best I have got so far is
(?=[A-Z]*\d)[A-Z0-9]{4,}
which detects sequences of length at least 4 that contain only letters A-Z and digits 0-9, and at least one digit and one letter. How can I make sure I respectively get at least two?
\b
.(?=(?:\d*[A-Z]){2})
(?:[A-Z]*\d){2}
[A-Z\d]*
until another \b
.Putting it together:
\b(?=(?:\d*[A-Z]){2})(?:[A-Z]*\d){2}[A-Z\d]*\b
See this demo at regex101 or a Python demo at tio.run
Note that a lookahead is a zero length assertion, it does not consume characters. If you don't specifiy a starting point eg \b
, the lookahead will be used at any place which is less efficient.
Further to mention, the minimum length of at least four will be satisfied by the requirements.