I'm trying to get multiple regex-matches with the find/find_all-method, but can't get it to work.
A piece of the HTML-code can be something like:
<b>Week</b> 22: 3871983
Then in code I'm trying the following:
import re
from robobrowser import RoboBrowser
browser = RoboBrowser(parser='html.parser')
browser.open(some_url_containing_the_above_html_code)
result = browser.find_all(text=re.compile('Week\s+(\d+).*?(\d+)'))
print(result)
Which outputs something like:
['Week 22:\xa3871983']
I expected something like:
['22', '3871983']
Does the \xa ruins it? Or won't you be able to return multiple matches within a single regex? Don't really know how to solve it. I could always store the return value in a string and parse it one more time with a split or regex, but I'd rather like to get it directly with find or find_all.
A misunderstanding about the find_all
function. All that it does return a list of elements that match the given condition. In your case it's a regex. Your regex has subpatterns. But that is not really relevent here. find_all
does not split by the regex. So
['Week 22:\xa3871983']
is the expected result. If you want this converted into ['22', '3871983']
import re
for result in results:
parts = re.split("\s", result)
parts[0] = parts[0][4:]