I have a python script using BeautifulSoup to scrape. This is my code:
re.findall('stream:\/\/.+', link)
Which is designed to find links like:
stream://987cds9c8ujru56236te2ys28u99u2s
But it also returns strings like this:
stream://987cds9c8ujru56236te2ys28u99u2s [SD] Spanish - (9.15am)
i.e. with spaces and extra stuff which I don't want. How can I express the
re.findall
So it only returns the link first part?
(Thanks in advance)
You can use a non-greedy match (adding ?
to the pattern) with a word boundary character '\b'
:
>>> re.findall(r'stream:\/\/.+?\b', link)
['stream://987cds9c8ujru56236te2ys28u99u2s']
Or if you want to match only word characters you can simply use '\w+'
:
>>> re.findall(r'stream:\/\/\w+', link)
['stream://987cds9c8ujru56236te2ys28u99u2s']