Search code examples
regexpython-3.xregex-greedy

Python regular expression problem with greedy


I'm following an online course, and I have problem with regular expression.

From "http://py4e-data.dr-chuck.net/known_by_Anayah.html" I'd like to extract only "Anayah"

This is my try:

stringToParse = "http://py4e-data.dr-chuck.net/known_by_Anayah.html, we just want Anayah"
print(re.search(r'[_](\w+)\.html', stringToParse).group(1))

This returns "by_Anayah" so the "by_" part in giving me some problems...

I know about ? for being non-greedy, but wherever I try to insert ?, I never get what I want.

Thank you for any help :)


Solution

  • It is because \w also matches _ chars. Replace \w with [^\W_] that matches all word chars except underscores.

    Use

    import re
    stringToParse = "http://py4e-data.dr-chuck.net/known_by_Anayah.html, we just want Anayah"
    print(re.search(r'_([^\W_]+)\.html', stringToParse).group(1))
    

    See the Python demo and the regex demo.