I have this example string:
line = '[text] something - https://www.myurl.com/test1/ lorem ipsum https://www.myurl.com/test2/ - https://www.myurl.com/test3/ marker needle - some more text at the end'
I need to extract the path (without slashes) before "marker needle". The following works to list all paths:
print re.findall('https://www\\.myurl\\.com/(.+?)/', line)
# ['test1', 'test2', 'test3']
However, when I change it to only find the path I want (the one before "marker needle"), it gives a weird output:
print re.findall('https://www\\.myurl\\.com/(.+?)/ marker needle', line)
# ['test1/ lorem ipsum https://www.myurl.com/test2/ - https://www.myurl.com/test3']
My expected output:
test3
I have tried the same with re.search
but the result is the same.
This expression has three capturing groups, where the second one has our desired output:
(https:\/\/www.myurl.com\/)([A-Za-z0-9-]+)(\/\smarker needle)
This tool helps us to modify/change the expression, if you wish.
jex.im visualizes regular expressions:
# -*- coding: UTF-8 -*-
import re
string = "[text] something - https://www.myurl.com/test1/ lorem ipsum https://www.myurl.com/test2/ - https://www.myurl.com/test3/ marker needle - some more text at the end"
expression = r'(https:\/\/www.myurl.com\/)([A-Za-z0-9-]+)(\/\smarker needle)'
match = re.search(expression, string)
if match:
print("YAAAY! \"" + match.group(2) + "\" is a match 💚💚💚 ")
else:
print('🙀 Sorry! No matches!')
YAAAY! "test3" is a match 💚💚💚
This snippet returns the runtime of a 1-million times for
loop.
const repeat = 10;
const start = Date.now();
for (var i = repeat; i >= 0; i--) {
const regex = /(.*)(https:\/\/www.myurl.com\/)([A-Za-z0-9-]+)(\/\smarker needle)(.*)/gm;
const str = "[text] something - https://www.myurl.com/test1/ lorem ipsum https://www.myurl.com/test2/ - https://www.myurl.com/test3/ marker needle - some more text at the end";
const subst = `$3`;
var match = str.replace(regex, subst);
}
const end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚💚💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");