I'm following an online course, and I have problem with regular expression.
From "http://py4e-data.dr-chuck.net/known_by_Anayah.html" I'd like to extract only "Anayah"
This is my try:
stringToParse = "http://py4e-data.dr-chuck.net/known_by_Anayah.html, we just want Anayah"
print(re.search(r'[_](\w+)\.html', stringToParse).group(1))
This returns "by_Anayah" so the "by_" part in giving me some problems...
I know about ? for being non-greedy, but wherever I try to insert ?, I never get what I want.
Thank you for any help :)
It is because \w
also matches _
chars. Replace \w
with [^\W_]
that matches all word chars except underscores.
Use
import re
stringToParse = "http://py4e-data.dr-chuck.net/known_by_Anayah.html, we just want Anayah"
print(re.search(r'_([^\W_]+)\.html', stringToParse).group(1))
See the Python demo and the regex demo.