python python-3.x beautifulsoup webrequest

Display the entire string if a partial match is found in a webpage using python and beautifulsoup

I managed to extract what I wanted in the snippet below however, I think its problematic. I need help in returning the entire string based on the partial match.

import requests
url = "https://bscscan.com/address/0x88c20beda907dbc60c56b71b102a133c1b29b053#code"
queries = ["twitter", "www.", "https://t.me"]

r = requests.get(url)
for q in queries:
    q = q.lower()
    if q in r.text.lower():
        if q.startswith(tuple(queries)):
            print("Found ", q)
        else:
            print("Not Found ", q)

Current Output:

Found  www.
Found  https://t.me

Wanted Output: #-- return the whole string

Found - www.shibuttinu.com
Found - https://t.me/Shibuttinu
Not Found - twitter

Solution

You could build a regular expression with your given queries. The following example assumes your whole strings are terminated by quotes a space or a newline (which might not always be the case?)

import requests
import re

url = "https://bscscan.com/address/0x88c20beda907dbc60c56b71b102a133c1b29b053#code"
r = requests.get(url)

queries = ["twitter", "www.", "https://t.me"]
re_queries = '|'.join(re.escape(q) for q in queries)
valid_url = "[a-z0-9:/?\-=&.]"
re_query = rf"['\" ]({valid_url}*?({re_queries}){valid_url}*?)['\"\n]"

for match in re.finditer(re_query, r.text, re.I):
    print(match.groups()[0])

This would return whole strings as:

twitter:card
twitter:title
twitter:description
twitter:site
twitter:image
https://www.googletagmanager.com/gtag/js?id=UA-46998878-23
www.shibuttinu.com
https://t.me/shibuttinu
https://www.binance.org/en/smartChain
https://twitter.com/BscScan
Twitter

What this is trying to do is locate all of your queries, but only if they proceeded with certain valid characters and also only if they are enclosed in quotes or a space. The regular expression syntax allows these restrictions to be defined. The use of the re.I flag allows these tests to be case insensitive (so removing the need to lowercase the text).