I am trying to scrape website and return their GTM container ID , I found a solution which is only working for a single specific website.
Which is working for : (https://www.observepoint.com/)
import urllib3
import re
from bs4 import BeautifulSoup
http = urllib3.PoolManager()
response = http.request('GET', "https://www.observepoint.com/")
soup = BeautifulSoup(response.data,"html.parser")
GTM = soup.head.findAll(text=re.compile(r'GTM'))
print(re.search("GTM-[A-Z0-9]{6,7}",str(GTM))[0])
But when I try it on another website for example https://www.dccomics.com/characters/superman%26sa%3DU%26ved%3D2ahUKEwi55uyMxfHxAhXMp5UCHTkMBekQFjAzegQIARAB%26usg%3DAOvVaw2PgfF7ZT6S6UeZpFImsXDC%2Cdccomics
it doesn't work (Returns None Object type) even though the GTM id value still exists and is on a same/similar iframe tag like in the previous website.
import requests
import re
urls = [
"https://www.observepoint.com/",
"https://www.dccomics.com/characters/superman%26sa%3DU%26ved%3D2ahUKEwi55uyMxfHxAhXMp5UCHTkMBekQFjAzegQIARAB%26usg%3DAOvVaw2PgfF7ZT6S6UeZpFImsXDC%2Cdccomics",
]
def main(url):
for url in urls:
r = requests.get(url)
match = re.findall("(GTM-[A-Z0-9]{6,7})", r.text)
if match:
print(set(match))
main("https://www.dccomics.com/characters/superman/")
Output:
{'GTM-5LS3NZ'}
{'GTM-538C4X'}