how can I identify .onion links in a text bearing in mind they can come in a variety of way;
hfajlhfjkdsflkdsja.onion
http://hfajlhfjkdsflkdsja.onion
http://www.hfajlhfjkdsflkdsja.onion
I'm thinking of regex but (.*?.onion)
would return the whole paragraph where the URL Link is buried in
This will do it: (?:https?://)?(?:www)?(\S*?\.onion)\b
(Added non-capturing groups - credit: @WiktorStribiżew)
Demo:
s = '''hfajlhfjkdsflkdsja.onion
https://hfajlhfjkdsflkdsja.onion
http://www.hfajlhfjkdsflkdsja.onion
https://www.google.com
https://stackoverflow.com'''
for m in re.finditer(r'(?:https?://)?(?:www)?(\S*?\.onion)\b', s, re.M | re.IGNORECASE):
print(m.group(0))
Output
hfajlhfjkdsflkdsja.onion
https://hfajlhfjkdsflkdsja.onion
http://www.hfajlhfjkdsflkdsja.onion