In a text I'd like to find if the text contains the following string:
"http://p.thisistheurl.com/v/"
after that anything until "jpg"
.
So this is the python code that I wrote:
asdf = 'http://p.thisistheurl.com/v/adzl25/4321567/543276123/865.jpg'
regex = re.compile(r'http://p.thisistheurl.com/v/(.)*jpg')
regex.search(asdf)
<_sre.SRE_Match object; span=(0, 60), match='http://p.thisistheurl.com/v/adzl25/4321567/543276'>
As you can see the result doesn't show the whole string with the "jpg"
. Why doesn't it work?
I don't think there's any guarantee that the characters displayed after match=
are actually the complete contents of the string that got matched. It probably just cuts off after 50 characters or so.
Looking at cpython's implementation of SRE_Match.__repr__
, this is indeed the case: the 50R
right there is the smoking gun.
result = PyUnicode_FromFormat(
"<%s object; span=(%d, %d), match=%.50R>",
Py_TYPE(self)->tp_name,
self->mark[0], self->mark[1], group0);
If you access the actual matched string, rather than inspect it from the match object's printed representation, it goes all the way to jpg
:
>>> import re
>>> asdf = 'http://p.thisistheurl.com/v/adzl25/4321567/543276123/865.jpg'
>>> regex = re.compile(r'http://p.thisistheurl.com/v/(.)*jpg')
>>> print(regex.search(asdf).group(0))
http://p.thisistheurl.com/v/adzl25/4321567/543276123/865.jpg