Search code examples
regexpython-3.xfindall

How to create non-capturing groups (How to apply `?` on combination of two sub expressions when using re.findall in python)?


I want to return all the words which start and end with letters or numbers. They may contain at most one period . OR hypen -in the word. So, ab.ab is valid but ab. is not valid.

import re
reg = r"[\d\w]+([-.][\d\w]+)?"
s = "sample text"
print(re.findall(reg, s))

It is not working because of the parenthesis. How can I apply the ? on combination of [-.][\d\w]+


Solution

  • If ab. is not valid and should not be matched and the period or the hyphen should not be at the start or at the end, you could match one or more times a digit or a character followed by an optional part that matches a dot or a hyphen followed by one or more times a digit or a character.

    (?<!\S)[a-zA-Z\d]+(?:[.-][a-zA-Z\d]+)?(?!\S)

    Regex demo

    Explanation

    • (?<!\S) Negative lookbehind to assert that what is on the left is not a non whitespace character
    • [a-zA-Z\d]+ Match one or more times a lower/uppercase character or a digit
    • (?:[.-][a-zA-Z\d]+)? An optional non capturing group that would match a dot or a hypen followed by or more times a lower/uppercase character or a digit
    • (?!\S Negative lookahead that asserts that what is on the right is not a non whitespace character.

    Python demo