python beautifulsoup python-requests urllib

Scraping a specific GTAG value from a website

I am trying to scrape website and return their GTM container ID , I found a solution which is only working for a single specific website.

Which is working for : (https://www.observepoint.com/)

import urllib3
import re
from bs4 import BeautifulSoup
http = urllib3.PoolManager()
response = http.request('GET', "https://www.observepoint.com/")
soup = BeautifulSoup(response.data,"html.parser")
GTM = soup.head.findAll(text=re.compile(r'GTM'))
print(re.search("GTM-[A-Z0-9]{6,7}",str(GTM))[0])

But when I try it on another website for example https://www.dccomics.com/characters/superman%26sa%3DU%26ved%3D2ahUKEwi55uyMxfHxAhXMp5UCHTkMBekQFjAzegQIARAB%26usg%3DAOvVaw2PgfF7ZT6S6UeZpFImsXDC%2Cdccomics

it doesn't work (Returns None Object type) even though the GTM id value still exists and is on a same/similar iframe tag like in the previous website.

GTM Value for working script:

GTM Value for the website script isn't functioning on:

Solution

import requests
import re

urls = [
    "https://www.observepoint.com/",
    "https://www.dccomics.com/characters/superman%26sa%3DU%26ved%3D2ahUKEwi55uyMxfHxAhXMp5UCHTkMBekQFjAzegQIARAB%26usg%3DAOvVaw2PgfF7ZT6S6UeZpFImsXDC%2Cdccomics",
]


def main(url):
    for url in urls:
        r = requests.get(url)
        match = re.findall("(GTM-[A-Z0-9]{6,7})", r.text)
        if match:
            print(set(match))


main("https://www.dccomics.com/characters/superman/")

Output:

{'GTM-5LS3NZ'}
{'GTM-538C4X'}